NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.
https://nvidia.github.io/earth2mip/
Apache License 2.0
187 stars 41 forks source link

Perturbation Methods Refactoring #137

Closed NickGeneva closed 9 months ago

NickGeneva commented 9 months ago

Earth-2 MIP Pull Request

Description

First pass at refactoring perturbation methods. Added pretty comprehensive testing. Util to help with normalization.

Moving refactors to beta folder.

Closes: https://github.com/NVIDIA/earth2mip/issues/128

Checklist

Dependencies

NickGeneva commented 9 months ago

/blossom-ci

NickGeneva commented 9 months ago

/blossom-ci

NickGeneva commented 9 months ago

/blossom-ci

nbren12 commented 9 months ago

Can these noise classes be geo operators? This will ensure seamless interoperability with the rest of the code (e.g. the diagnostics, the regridding functionality, etc).

this also captures the grid information in the explicit grid object LatLonGrid(which could be statically checked with mypy) rather than passing around dicts of coordinate info that need to be validated at runtime.

this docs page explains a bit more about this static metadata philosphy:

A design philosphy we follow is that model wrappers should take plain torch tensors as inputs and outputs and provide static metadata (e.g. grid, channel, time) about how those tensors map onto the planet. An alternative philosophy is to use some kind of metadata-aware container either custom-built or off-the-shelf like xarray. We avoid the latter approach since there is always a temptation to implement all of mathematics on some container type or hide the arrays under several layers of containers.

while this PR does not use a dynamic container for the data to be perturbed, it does for the coordinate info.

Just pointing out the current patterns this repo uses to capture coordinate metadata and the reasoning for that static/explicit approach compared to the one used in this PR. Seems like you were having some back and forth on this topic (e.g. what order should lat/lon be in), so I thought this may be helpful.

NickGeneva commented 9 months ago

/blossom-ci

NickGeneva commented 9 months ago

Thanks Noah, Presently experimenting with explicitly passing coordinate systems with tensors as the primary dataflow. Hence won't see many connections to items that rely on properties for channels, grids, etc. Hopefully enabling some better hook ups with internal code. In the works still.

nbren12 commented 9 months ago

Presently experimenting with explicitly passing coordinate systems with tensors as the primary dataflow.

Sure. Perhaps we can make a decision (properties vs args) when I return and apply it consistently across the code. I'll make a note on this.

Presently experimenting with explicitly passing coordinate systems with tensors as the primary dataflow.

It could be possible to pass the LatLonGrid dataclass instead of dict and take advantage of static goodness (i.e. no KeyErrors) and some shared utilities.

nbren12 commented 9 months ago

And I do see the value of such experimentation but wonder if we could reduce future work by finishing the design before refactoring a big chunk of code.

Would an end-to-end prototype be helpful? For an ensemble forecast, you could refactor or mock a single data source, time-stepping model, diagnostic model, and perturbation method. Then, a short example script could be written with the new design and we could compare quickly compare with the existing approach and refine it.