digital-earths-global-hackathon / digital-earths-global-hackathon.github.io

https://digital-earths-global-hackathon.github.io/planning/
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

standardizing the standardized grid #6

Open d70-t opened 2 months ago

d70-t commented 2 months ago

The start page currently mentions the existence on a "output data standardized on a common (HEALPix) grid". During the nextGEMS project, we've discovered that hierarchies and chunking are key and that HEALPix can help. I want to re-emphasize this again: HEALPix is nice and helps a lot, but the role of HEALPix is more the one of a supporter but it's not doing all the work. The main benefits we've seen have been through the application of chunking and multiple resolutions, as well as through the single dataset view.

In order to have some standardized model output, there are more things to specify, which must be done in advance as it involves on-disk data reorganization:

There are other options which probably need specification, but it might be sufficient to do so in a later stage

bjorn-stevens commented 2 months ago

Thanks... I will add this in my next edits, as I've been working on the protocol. I'm still not fully adapted to the git workflow and need a faster turn around to get it in my neurons.

Once my pull request is accepted I will start the next round of edits and incorporate this input. Continued engagement of d70-t is welcome.

florianziemen commented 2 months ago

I've added a few details on the technical page in #15 . Needs more work, though.

florianziemen commented 2 months ago

I'd be totally up for a set of standard tests that a dataset has to pass, e.g. we expect a catalog, where we can do a (time='P1D', zoom=0) on a dataset, get something with 12 cells, and 360-400 time steps, where we can do a mean across the 'cell' and 'time' dimensions of 'tas', which should then end up somewhere between 280 and 300 (to ensure the correct units). Similar for 'pr' a test to ensure that it is correct units and aggregation type / ... and so on...