HakaiInstitute / hakai-ctd-qc

Series of tests applied to the Hakai CTD profile data based on the QARTOD tests and other Hakai Specific ones.
0 stars 0 forks source link

Hakai Profile QC tool integration #5

Closed JessyBarrette closed 2 years ago

JessyBarrette commented 2 years ago

Issue

The Hakai Profile QAQC tool needs to be integrated within the Hakai CTD profile workflow. This tool needs to be triggered when:

  1. Changes are added to the QCing method. New tests, new thresholds.
  2. New Profile has been processed

How to use the package

Once installed on a server, the tool could be triggered by sending a command line like:

python hakai_profile_qc.review -update_db -hakai_id [comma delimited list of hakai_id (ex: 080217_2017-01-08T18:03:05.167Z,080217_2017-01-26T16:56:39.000Z
)

Method

Once triggered, the tool would retrieve the data through the hakai-api from the view ctd.ctd_file_cast_data. It would then run a series of tests.

The main question is how to return those resulting flags back to the database. One potential way is to send back through the hakai-api a json string of the resulting data delimited per ctd_data_pk with each individual flag (level1 and level2). See the following jupyter notebook for an example: https://github.com/HakaiInstitute/hakai-profile-qaqc/blob/main/notebook/update-hakai-db-flags.ipynb

How to trigger

The tool can be triggered by the server by using the processing_stage column within ctd_cast table. Any cast available at the 8_*stage (8_rbr_processed or 8_binAvg) is ready to be qced. Once QCed and corresponding casts processing_stage can be move to the 9_qc_auto stage.

fostermh commented 2 years ago

I assumed the qc tool would run on a cron job, similar to the post processing script. it would look for any cast with 8_* processing level and do qc on it. Were you thinking something different?

JessyBarrette commented 2 years ago

Right ok sounds good to me!

JessyBarrette commented 2 years ago

The new script present here can potentially be use to trigger on a cronJob to qc any profiles not qced yet on the server.

The script apply the following steps:

  1. Query the database in the ctd.ctd_file_cast view for any casts in the processing_stages 8_rbr_processed or 8_binAvg. (A limit of 1000 was added for now since it seems like the database would fail to return limit=-1
  2. Then for 30 hakai_id (the limitation is the query length) at the time, it will download the corresponding data within ctd.ctd_file_cast_data and run the different tests to apply.
  3. Once the qc done, the script uploads the resulting flags within ctd.ctd_file_cast_data back to the database through the api and update the processing_stage of the correspond hakai_id.
  4. Start over again with another 30 hakai_id until the whole set is completed.

If more than a 1000 profiles needs to be qced those will be run the next cronJob.

JessyBarrette commented 2 years ago

The auto qc method was successfully integrated on the hakaidev database and applied resulting flags to the right data.

Some work needs to get done to change the data type of the qartod flag columns *_f;ag_level_1 which are strings right now to integers https://github.com/HakaiInstitute/hakai-database/pull/327

Once that is completed, the portal should be able to make available any profiles associated with the processing_stages: '8_binAvg', '8_rbr_processed', '9_qc_auto', '10_qc_pi'

JessyBarrette commented 2 years ago

This is all working with the latest development branch commit 55a5022725f3e84b3081f1e081b0247912f7b233