hakai-ctd-qc
is the main package used to handle the QCing of the CTD
Datasets maintained by the Hakai Institute. Please refer to the
test description manual for a full description of the
different tests applied within this package. Examples of tes
The present package can be installed locally or through a docker container. In all cases, it is best to clone locally the package and apply the appropriate configuration.
git clone git@GitHub.com:HakaiInstitute/hakai-ctd-qc.git
Clone locally the repository and create the conda environment:
pyenv install 3.11.2
pyenv local 3.11.2
pip install poetry
poetry install
Copy the sample.env
file as .env
and replace the different values accordingly.
Once installed the package hakai_profile_qc can be run via the command line. See the help menu for a complete description of the different options:
python hakai_profile_qc --help
Usage: hakai_ctd_qc [OPTIONS]
Options:
--hakai_ids TEXT Comma delimited list of hakai_ids to qc
--processing-stages TEXT Comma list of processing_stage profiles to
review [env=QC_PROCESSING_STAGES] [default:
8_binAvg,8_rbr_processed]
--test-suite Run Test suite [env=RUN_TEST_SUITE]
--api-root TEXT Hakai API root to use [env=HAKAI_API_ROOT]
[default: https://goose.hakai.org/api]
--upload-flag Update database flags
[env=UPDATE_SERVER_DATABASE]
--chunksize INTEGER Process profiles by chunk
[env=CTD_CAST_CHUNKSIZE] [default: 100]
--sentry-minimum-date TEXT Minimum date to use to generate sentry warnings
[env=SENTRY_MINIMUM_DATE]
--profile PATH Run cProfile
--help Show this message and exit.
Run the following command:
poetry run python hakai_ctd_qc/api.py
And within a browser to go: http://127.0.0.1:8000
With vscode you can also run the debug configuration Run API
which helps debug the interface in realtime.
[!IMPORTANT] To protect the api from unpexted calls, you can set a list of accepted tokens as a list of comma separated list. Any post calls to the api will then require a field
token
within the header of the post command and an accepted value.
The hakai_ctd_qc tool is deployed via a Docker container (see Dockerfile) on two caprover instances: related to the development and production branches.
Each been associated to their respective hakai database:
The different tests applied are defined within the respective configurations:
A subset of hakai_ids is used to test the qc tool and is maintained here
Manual flags can also be implemented on any instrument-specific variables via the grey-list, which overwrites any automatically generated flags.
To make sure the tests are working appropriately a a series of pytests are available. Some of the tests are specific to the hakai tests, others to the hakai test suite.
The test suite is made available locally via the parquet file, or retrieved from the development or production database.
To run all the tests locally:
poetry run pytest .
To run all the tests with the production data (hecate) or development data (goose). Use the --test-suite-from
option. Here's an example for goose:
poetry run pytest . --test-suite-from goose
Once to test the results on any of the databases without rerunning the tests on the data, you can use the --test-suite-qc False
option.
poetry run pytest . --test-suite-form goose -k test_source_expected_results