Add benchmarking of cloudmetrics from @gmandorl's work

leifdenby commented 1 year ago

Aim: have functionality in cloudmetrics to run benchmark tests from @gmandorl's work that compute robustness measures and from these measures produce plot, and integrate this all to be auto-compiled into readthedocs.org documentation

Caveat: benchmarking will only be of mask-based metrics for now

Components needed:

Cloud-mask dataset, Himawari, 4km resolution, every 30 minutes (day and night), tropical band, 4 years, 10deg x 10deg domain. Currently as individual netCDF files (one per scene) with @gmandorl.
- needs combining to single zarr archive, packing as zip file and hosting on zenodo
Code to perform benchmark calculations: compute robustness measures by perturbing cloud masks. Robustness is measured by how percentiles of metrics values change across the entire dataset.
- needs tidying and including in cloudmetrics.benchmark.data
- reference source code in @gmandorl's repo: https://github.com/gmandorl/Assessment_of_the_object-based_indices_to_identify_convective_organization/tree/main/data_production
Code to plot benchmark metrics
- needs tidying and including in cloudmetrics.benchmark.plots
- reference source code in @gmandorl's repo: https://github.com/gmandorl/Assessment_of_the_object-based_indices_to_identify_convective_organization/tree/main/plot_production

Robustness measures:

a. Noise robustness

sensitivity to adding a small new object
sensitivity to merging two close objects

b. Intrinsic behavior

sensitivity to object's proximity
sensitivity to object size

c. Capacity to compare across diverse datasets

sensitivity to horizontal resolution
sensitivity to time resolution
sensitivity to domain limits

Milestones

1. Get cloud-mask data on zenodo and add download functionality

python -m cloudmetrics.benchmark.data

Will download and unpack to benchmark_data/.

Tasks:

@gmandorl: upload to zenodo
@leifdenby: write cloudmetrics code to download and unpack data

2. Command line script for running and producing benchmark plots

python -m cloudmetrics.benchmark.plots

Will produce plot files benchmark.noise_robustness.png, ... in the directory where the above command is run.

Tasks:

@leifdenby: implement noise robustness as an example
@gmandorl: add rest of robustness metrics

3. Inclusion of auto-generated benchmark plots in readthedocs documentation

Tasks:

@leifdenby: integrate benchmark plotting in readthedocs ci/cd
@leifdenby (and @gmandorl to review): text for benchnmarking webpage

martinjanssens commented 1 year ago

This is just wonderful, thanks so much @gmandorl and @leifdenby for taking the initiative and offering to put in the work to get this working. I'm totally agree with how you want to structure this as well. I have three additional thoughts, which I'll write in separate comments so we can discuss them separately if we want to!

martinjanssens commented 1 year ago

In our original repo, we plotted how the different metrics relate to one another, for a given dataset. Can we add that to this issue and to cloudmetrics.benchmark.plot? I'm envisioning a plot like fig. 3 of our paper. Knowing how the metric one intends to use relates to other metrics, and if a set of chosen metrics actually captures independent information, seems valuable and a nice complemen to e.g. the robustness and intrinsic behaviour tests. I'd be happy to contribute the code to do this.

martinjanssens commented 1 year ago

How do we feel about committing to a single benchmarking data set? Is it trivial that the benchmarking will behave the same for different cloud regimes (deep convection vs shallow convection, trade cumuli vs other cloud types...)? I like starting with @gmandorl's Himawari data, but I think we might want to extend to other cloud regimes if we want to serve the broader community.

martinjanssens commented 1 year ago

How do we want to combine this with a broader documentation of the metrics? Did you have a plan already, @leifdenby?

gmandorl commented 1 year ago

@martinjanssens, the 7 benchmarks mentioned above do not provide information on the relation between metrics or on the degree of independence from one another. I think that adding such a test would be absolutely great!

I agree regarding the dataset. Currently, the mask only includes high clouds in the warm-poor region. It might be beneficial to consider offering a more diverse set of options for testing.

It is not obvious that different datasets exhibit similar behavior. However, as far as I saw, the primary differentiator seems to be the average number of objects and their distribution across the domain, rather than their nature (but it would be worthwhile to test it again).

gmandorl commented 1 year ago

@leifdenby I uploaded the dataset to zenodo. It contains high cloud mask derived from the TOOCAN database. Since I used just the mask, the data I uploaded is basically a recalibrated and cleaned version of HIMAWARI.

This is the doi https://doi.org/10.5281/zenodo.8413762

gmandorl commented 1 year ago

I have adjusted the dataset's chunk size. It was initially set to 1 and has now been increased to 100.

The new doi is https://doi.org/10.5281/zenodo.8422191

cloudsci / cloudmetrics