CF quality control of CMIP6 data data CEDA
To get started you will need: a list of dataset ids and a list of variables in json format.
Ag will generate a list of dataset ids that require CF checking, you can then use
python cfchecker_run_all.py --file <dataset_ids-file> --qc_check cfchecker
Note- this script will tell you which datasets are 'missing' ie. not in the CEDA Archive. You can run this script multiple times and it will not repeat the QC as long as there are log files showing the QC was already run.
If using the --file option
this calls directly:
cfchecker_run_unit.py
Running without a file uses ABC unit approach calling batch and chunk.
cfchecker_run_unit.py
sends the jobs to Lotus. Each dataset is sent to lotus as some datasets are large.
To see the jobs in lotus run 'squeue -u
This produces a CF results file in the form of a psv file in a directory called qc_logs
Once you have all the QC results of the datasets you want to check (no 'missing') then run:
nohup ./create_expt_psvs.sh > /outdir/ofile.out &
nohup ./create_model_psvs.sh > /outdir/ofile.out &
These can take a while to run.
Run python generate_c3s-34g_dataframe.py --create
TODO: re-write this so you understand what is happening
A QC_template.json file will be provided. Use the script complete_json_release_template.py
to fill out this template with the QC results.
Once complete name the output file QC_cfchecker.json
and upload to the appropriate github branch
of https://github.com/cp4cds/c3s_34g_qc_results/ in directory QC_Results. Although a local copy already exists on the system
others may be working on this. You can add your results file via a pull request or you can clone your own copy and push as others
check_for_missing_psvfiles.py
: this takes the dataset ids and makes sure that the
output psv file exists for a dataset. If not then you can rerun the cfchecker_run_all.py
_*py
older files that may have useful information - when you are confident you don't
need these then remove from github and local repo.