CABLE-LSM / benchcab

Tool for evaluation of CABLE land surface model
https://benchcab.readthedocs.io/en/latest/
Apache License 2.0
2 stars 4 forks source link

Add bitwise comparison step for FLUXNET test suite #76

Closed SeanBryan51 closed 1 year ago

SeanBryan51 commented 1 year ago

This change adds the ability for benchcab to run bitwise comparisons between NetCDF output files using the nccmp command. Comparisons are made between outputs that differ in their realisation and are matching in all other configurations (science configurations and meteorological forcing). Write standard output from comparison tasks on failure to the runs/site/analysis/bitwise-comparisons directory.

Since multiple realisations can be specified, comparisons are made between all pair wise combinations of realisations.

This change removes the --no-submit optional argument from benchcab fluxnet-run-tasks and instead submit a PBS job only when running benchcab run or benchcab fluxnet. We do this so that we can run the bitwise comparison step in the same job script used to run CABLE.

The comparison step can be run in isolation by executing benchcab fluxnet-bitwise-cmp. However, this should ideally be executed on a compute node (inside a PBS job for example).

This change also refactors the parallelisation scheme used for running CABLE tasks and comparison tasks so that workers fetch tasks from a multiprocessing.Queue object. Previously, if a process had completed the CABLE tasks it was allocated, it will remain idle until all other processes had completed their allocated tasks. This change prevents processes from idling if tasks are still yet to be completed.

The comparison step can be skipped by specifying --skip fluxnet-bitwise-cmp to composite commands such as benchcab run and benchcab fluxnet.

Fixes https://github.com/CABLE-LSM/benchcab/issues/32 Fixes https://github.com/CABLE-LSM/benchcab/issues/77

SeanBryan51 commented 1 year ago

TODO

SeanBryan51 commented 1 year ago

Now using nccmp instead of cdo diffn. This is because cdo diffn was using an unreasonable amount of memory per comparison and was causing processes to be killed. The memory usage of cdo diffn linearly increases with respect to time and may be a memory leak in the tool.

Tested for cdo versions 1.7.2, 1.9.8, 1.9.10 and 2.0.5