dfo-mar-odis / dart

DFO At-sea Reporting Template, collect Elog, CTD and sample data at sea to produce reports then upload to BioChem at the end of the mission
3 stars 1 forks source link

BC Discrete Station Validation #144

Closed upsonp closed 4 weeks ago

upsonp commented 1 month ago

This issue is to dissect Biochem validation as part of issue #141

This is the validation code run by the biochem intake script on the Discrete Station Edits table (BCS). It can be found in the BioChem account on PTran under packages.

image

create or replace PACKAGE BODY VALIDATE_DISCRETE_STATN_DATA
IS
 FUNCTION VALIDATE_DISCRETE_STATION(batch_seq_in_par IN NUMBER) RETURN VARCHAR2 IS
  error_count_var       NUMBER:=0;
  return_var            VARCHAR2(1):= 'T';
BEGIN
  -- this insert statement populated the BCSTATNDATAERRORS table with records from the BCDISCRETESTATNEDITS
  -- table that fail the specified column checks
  INSERT INTO BCSTATNDATAERRORS
  SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'DIS_SAMPLE_KEY_VALUE' errors
    'DIS_SAMPLE_KEY_VALUE',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
   FROM BCDISCRETESTATNEDITS
   WHERE batch_seq = batch_seq_in_par
    AND DIS_SAMPLE_KEY_VALUE IS NULL
   UNION
-- added Oct 12, 2000 to implement fix to Mary's plankton dataset that had
-- should discuss if this is worth having. Since the cases are rare that this will happen.
-- and the processing time for this check is significant.
   SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),                                  -- insert duplicate Header records errors
    'DIS_HEADR_COLLECTOR_SAMPLE_ID',
    4054,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
   FROM BCDISCRETESTATNEDITS
   WHERE MISSION_DESCRIPTOR||EVENT_COLLECTOR_EVENT_ID||EVENT_COLLECTOR_STN_NAME||dis_headr_collector_sample_id
   IN(
    select  MISSION_DESCRIPTOR||EVENT_COLLECTOR_EVENT_ID||
        EVENT_COLLECTOR_STN_NAME||dis_headr_collector_sample_id
    FROM BCDISCRETESTATNEDITS
    WHERE batch_seq = batch_seq_in_par
        GROUP BY MISSION_DESCRIPTOR,
        EVENT_COLLECTOR_EVENT_ID, EVENT_COLLECTOR_STN_NAME,
        dis_headr_collector_sample_id
   HAVING COUNT(*)>1)
  AND batch_seq = batch_seq_in_par
  AND dis_headr_collector_sample_id IS NOT NULL
   UNION
   SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'MISSION_DESCRIPTOR' errors
    'MISSION_DESCRIPTOR',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
   FROM BCDISCRETESTATNEDITS
   WHERE batch_seq = batch_seq_in_par
    AND MISSION_DESCRIPTOR IS NULL
/*
removed as per BIOCHEM meeting Sept 6,2000
UNION
   SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'DIS_HEADR_COLLECTOR_SAMPLE_ID' errors
    'DIS_HEADR_COLLECTOR_SAMPLE_ID',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
   FROM BCDISCRETESTATNEDITS
   WHERE batch_seq= batch_seq_in_par
    AND dis_headr_collector_sample_id IS NULL
*/
   UNION
   SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'EVENT_COLLECTOR_EVENT_ID' errors
    'EVENT_COLLECTOR_EVENT_ID',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
   FROM BCDISCRETESTATNEDITS
   WHERE batch_seq= batch_seq_in_par
    AND event_collector_event_id IS NULL;
   -- update all the process_flag field of the station records to 'SVI'
   UPDATE BCDISCRETESTATNEDITS
        SET process_flag = 'SVI'
        WHERE batch_seq = batch_seq_in_par;
   -- count the number of errors inserted to the errors table due to the above code
   --
   SELECT count(*) INTO error_count_var FROM BCSTATNDATAERRORS
    WHERE batch_seq=batch_seq_in_par
           AND STATN_DATA_TABLE_NAME = 'BCDISCRETESTATNEDITS';
   -- if there were errors logged for the data set identified by the created date then
   -- the process flags for the station and data records of the asociated records must be updated
   -- accordingly
   IF error_count_var > 0 THEN
           return_var := 'F';
       -- set the station process flag to the SVE error for all station records that had an error
       -- logged to the errors table
       UPDATE BCDISCRETESTATNEDITS
        SET process_flag='SVE'
        WHERE (DIS_SAMPLE_KEY_VALUE IN ( SELECT DISTINCT RECORD_SEQUENCE_VALUE
                        FROM BCSTATNDATAERRORS
                        WHERE STATN_DATA_TABLE_NAME= 'BCDISCRETESTATNEDITS'
                        AND batch_seq=batch_seq_in_par)
        OR DIS_SAMPLE_KEY_VALUE IS NULL)
        AND batch_seq= batch_seq_in_par;
       -- Also set the data process flag to the DVE error for all data records whose parent station record had an error
       -- logged to the errors table
       UPDATE BCDISCRETEDATAEDITS
        SET process_flag='DVE'
        WHERE DIS_SAMPLE_KEY_VALUE IN ( SELECT DISTINCT RECORD_SEQUENCE_VALUE
                        FROM BCSTATNDATAERRORS
                        WHERE STATN_DATA_TABLE_NAME= 'BCDISCRETESTATNEDITS'
                        AND batch_seq=batch_seq_in_par)
        AND batch_seq= batch_seq_in_par;
   END IF;
   RETURN return_var;
  END VALIDATE_DISCRETE_STATION;
upsonp commented 1 month ago

Let's make the code above more readable:

create or replace PACKAGE BODY VALIDATE_DISCRETE_STATN_DATA
IS
 FUNCTION VALIDATE_DISCRETE_STATION(batch_seq_in_par IN NUMBER) RETURN VARCHAR2 IS
  error_count_var       NUMBER:=0;
  return_var            VARCHAR2(1):= 'T';
BEGIN
  -- this insert statement populated the BCSTATNDATAERRORS table with records from the BCDISCRETESTATNEDITS
  -- table that fail the specified column checks
  INSERT INTO BCSTATNDATAERRORS
  1. Check if the Sample ID is null

    SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'DIS_SAMPLE_KEY_VALUE' errors
    'DIS_SAMPLE_KEY_VALUE',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
    FROM BCDISCRETESTATNEDITS
    WHERE batch_seq = batch_seq_in_par
    AND DIS_SAMPLE_KEY_VALUE IS NULL
    UNION
  2. Check if the mission descriptor is null

    SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'MISSION_DESCRIPTOR' errors
    'MISSION_DESCRIPTOR',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
    FROM BCDISCRETESTATNEDITS
    WHERE batch_seq = batch_seq_in_par
    AND MISSION_DESCRIPTOR IS NULL
    UNION
  3. Check if the Event ID is null

    SELECT
    'BCDISCRETESTATNEDITS',
    nvl(dis_sample_key_value,'X'),      -- insert null 'EVENT_COLLECTOR_EVENT_ID' errors
    'EVENT_COLLECTOR_EVENT_ID',
    4001,
    created_date,
    nvl(dis_headr_collector_sample_id,'N/A'),
        batch_seq
    FROM BCDISCRETESTATNEDITS
    WHERE batch_seq= batch_seq_in_par
    AND event_collector_event_id IS NULL;

Update the Discrete Station Edits (BCS) table and set the process flag to 'svi' for all data with the indicated batch sequence number.

   -- update all the process_flag field of the station records to 'SVI'
   UPDATE BCDISCRETESTATNEDITS
        SET process_flag = 'SVI'
        WHERE batch_seq = batch_seq_in_par;

Count how many errors are in the Station Errors table with the provided batch sequence number

   -- count the number of errors inserted to the errors table due to the above code
   --
   SELECT count(*) INTO error_count_var FROM BCSTATNDATAERRORS
    WHERE batch_seq=batch_seq_in_par
           AND STATN_DATA_TABLE_NAME = 'BCDISCRETESTATNEDITS';

If there were errors discovered in the code above then set the process flag on the Discrete Station Edit (BCS) table to 'SVE' (Station Value Error?).

   -- if there were errors logged for the data set identified by the created date then
   -- the process flags for the station and data records of the asociated records must be updated
   -- accordingly
   IF error_count_var > 0 THEN
           return_var := 'F';
       -- set the station process flag to the SVE error for all station records that had an error
       -- logged to the errors table
       UPDATE BCDISCRETESTATNEDITS
        SET process_flag='SVE'
        WHERE (DIS_SAMPLE_KEY_VALUE IN ( SELECT DISTINCT RECORD_SEQUENCE_VALUE
                        FROM BCSTATNDATAERRORS
                        WHERE STATN_DATA_TABLE_NAME= 'BCDISCRETESTATNEDITS'
                        AND batch_seq=batch_seq_in_par)
        OR DIS_SAMPLE_KEY_VALUE IS NULL)
        AND batch_seq= batch_seq_in_par;

Then find all the rows in the Discrete Data Edit (BCD) table linked to the stations with errors and set their process flag to 'DVE' (Data Value Error?)

       -- Also set the data process flag to the DVE error for all data records whose parent station record had an error
       -- logged to the errors table
       UPDATE BCDISCRETEDATAEDITS
        SET process_flag='DVE'
        WHERE DIS_SAMPLE_KEY_VALUE IN ( SELECT DISTINCT RECORD_SEQUENCE_VALUE
                        FROM BCSTATNDATAERRORS
                        WHERE STATN_DATA_TABLE_NAME= 'BCDISCRETESTATNEDITS'
                        AND batch_seq=batch_seq_in_par)
        AND batch_seq= batch_seq_in_par;
   END IF;
   RETURN return_var;
  END VALIDATE_DISCRETE_STATION;
upsonp commented 1 month ago

In summary, to ensure the function above passes a BCS row must have:

  1. Mission Descriptor
  2. Sample ID
  3. Event ID

Dart does not allow blank or null Sample or Event IDs, but should validate that the mission has a Mission Descriptor.

When a mission is created the mission descriptor does not have to be provided. A user is expected to supply the descriptor at a later date because it may not be initially known, although it's most often the same as the 'mission name' the user will use to create a Dart mission. when creating BCS/BCD tables, but the descriptor could be missing an a user will have to supply it on the mission settings page. image

This will have to be validated and set before the user can upload data to BCS/BCD tables.