BritishGeologicalSurvey / pyagsapi

pyagsapi - An AGS Utilities API with AGS v4.x Schema Validation & Converter for .ags<-->.xslx files
https://britishgeologicalsurvey.github.io/pyagsapi/
GNU Lesser General Public License v3.0
12 stars 2 forks source link

SAMP_ID validation error TRIT & SUCT groups #54

Closed KoalaGeo closed 1 year ago

KoalaGeo commented 2 years ago

Reported error: "On running the validator I got a message that the TRIT group failed due to (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique but SPEC_DPTH,SPEC_REF and TRIT_TESN should be included to make the records unique"

Group: TRIT - Duplicate sample id F-6LDR6B-5IJN: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique 
Group: TRIT - Duplicate sample id F-ARNH6B-4YMR: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique 
Group: TRIT - Duplicate sample id F-MKCF7B-6YR5: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique 
Group: TRIT - Duplicate sample id F-ZHKX7B-FCKL: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique

Where TRIT is present need to concatenate SAMP_ID + TRIT_TESN as that's the unique ID. SAMP_ID is insufficient on it's own as a single sample is tested 3 times with triaxle testing.

The current validator only looks at the SAMPLE table & SAMP_ID or the Key Attributes.

KoalaGeo commented 2 years ago

Work starting on https://github.com/BritishGeologicalSurvey/pyagsapi/tree/trit-group-check

Requires change to logic in https://github.com/BritishGeologicalSurvey/pyagsapi/blob/main/app/bgs_rules.py#L289-L367# IF TRIT group is present.

Current Logic:

https://github.com/BritishGeologicalSurvey/pyagsapi/blob/main/app/bgs_rules.py#L339-L344# Needs IF Group = Trit SAMP_ID may be duplicated Must have Unique Combination of:

Use https://github.com/BritishGeologicalSurvey/pyagsapi/blob/trit-group-check/test/files/bgs_rules/DEPRWD11%20-%202022-07-12%201608%20-%20Final.ags for testing

andy-bevan commented 2 years ago

I think the problem may be more involved than just fixing the TRIT group. The code seems to assume that every child group of SAMP has a one-to-one relationship. But I think its actually the case that most of the relationships are one-to-many. This means that we need to understand and store the set of headings that define the unique key for each group that is a child of SAMP. They are defined in the AGS spec and I have transposed them into a table in oracle for reference. We could use that to create some static arrays in the python to refer to, perhaps?

KoalaGeo commented 1 year ago

I found a similar problem with the SUCT group yesterday, so there may be more

BGS data validation: Sample Referencing

Group: SUCT - Duplicate sample id WFL1: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL10: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL11: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL12: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL13: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL14: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL15: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL16: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL3: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL5: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL6: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL7: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL8: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique Group: SUCT - Duplicate sample id WFL9: SAMP_ID or (LOCA_ID,SAMP_TOP,SAMP_TYPE,SAMP_REF) must be unique

KoalaGeo commented 1 year ago

Proposed SUCT Group logic

IF Group = SUCT SAMP_ID may be duplicated Must have Unique Combination of:

LOCA_ID, SAMP_TOP, SAMP_REF, SAMP_TYPE, SAMP_ID, SPEC_REF & SPEC_DPTH (these are all mandatory feilds) OR SAMP_ID + SPEC_REF ELSE use code as currently written

KoalaGeo commented 1 year ago

AGS_UNIQUE_COLUMN_202302231435.csv

See attached from @andy-bevan

KoalaGeo commented 1 year ago

https://github.com/BritishGeologicalSurvey/pyagsapi/pull/73 closes issue