BritishGeologicalSurvey / pyagsapi

pyagsapi - An AGS Utilities API with AGS v4.x Schema Validation & Converter for .ags<-->.xslx files
https://britishgeologicalsurvey.github.io/pyagsapi/
GNU Lesser General Public License v3.0
12 stars 2 forks source link

Validate SAMP group #47

Closed ximenesuk closed 3 years ago

ximenesuk commented 3 years ago

This checks the rules for sample IDs in the SAMP group: whether present, duplicated and consistent. It also checks that any IDs in other groups also follow these rules and if valid the IDs appear in the SAMP group.

The test file, Wigan Depot.ags shows missing sample IDs, in this case the composite IDs are not complete.

The unit tests pass but two cases generate SettingWithCopyWarning. I can't track down the source of this but I noticed other warnings in the logs from other tests.

volcan01010 commented 3 years ago

Running this against some real data files and found errors in them:

================================================================================
Southwark.ags: 12 error(s) found in file!

# Metadata

File size: 20990 bytes
Checkers: ['bgs_rules v2.0.0']
Time: 2021-09-24 09:52:14.572958+00:00

# Errors

## Sample Referencing

Group: SAMP - Record 1 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 28 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 29 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 30 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 31 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 32 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 33 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 34 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 35 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 36 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 37 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
Group: SAMP - Record 47 is missing either SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF)
================================================================================
A112794-16 Glenally_Road_Factual_FINAL.ags: 3 error(s) found in file!

# Metadata

File size: 16701 bytes
Checkers: ['bgs_rules v2.0.0']
Time: 2021-09-24 09:52:05.240576+00:00

# Errors

## Spatial Referencing

Group: LOCA - Spatial referencing system not in LOCA_GREF, LOCA_LREF or LOCA_LLZ!

## LOCA within Great Britain

Group: LOCA - NATE / NATN outside Great Britain and Northern Ireland (BH01)

## Sample Referencing

Group: SHBT - Duplicate sample id BH01,2.8,B,8: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique

================================================================================
volcan01010 commented 3 years ago

I'd never seen set( ) <= set() notation before. That's nice.