Refactor: Use TimeLoop interface in inference_ensemlbe

NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.

https://nvidia.github.io/earth2mip/

Apache License 2.0

187 stars 41 forks source link

Refactor: Use TimeLoop interface in inference_ensemlbe #58

Closed nbren12 closed 11 months ago

nbren12 commented 11 months ago

For legacy reasons, inference_ensemble requires methods/attributes not present in the TimeLoop abstraction, and is tightly coupled to earth2mip.networks.Inference. The main source of coupling is related to normalization.

To get around this, the perturbation function should happen in physical units not in normalized values. Here are the steps that need to happen:

To do this:

load the mean/std here: https://github.com/NVIDIA/earth2mip/blob/5999d7e4f3a95e89f1db64dc5e6644f405bfd02e/earth2mip/inference_ensemble.py#L231. You will need to store this data someplace….open to suggestions. since it is limited amount of data, I would be okay storing a text/data file with many channels worth of means/stds inside the repo.
unnormalize and renormalize before/after this line: https://github.com/NVIDIA/earth2mip/blob/5999d7e4f3a95e89f1db64dc5e6644f405bfd02e/earth2mip/inference_ensemble.py#L261.
remove normalization handling here and here

Is that clear?

yairchn commented 11 months ago

@nbren12 Do you think it matters what is the normalization as long as we know how to denomerlize later ? what I mean is that I can get the mean and std of the initial condition and use them for norm and denorm if / when needed. This means that two runs with different IC's might have different mean and std but since they are global I wonder if the small difference is going to change much

nbren12 commented 11 months ago

Seems it would be nice to do this in a backwards compatible way. normalizing with an initial condition dependent value is convenient, but is different from how we've been doing it. Is there some barrier to getting mean/std as I suggested?

yairchn commented 11 months ago

no, just a thought of a way to avoid storing extra txt files on the repo - I will go with what you suggested

NickGeneva commented 11 months ago

The key here is that the perturbation method needs to know how to handle its normalization / denormalization on its own.

If the normalization depends on the initial condition then that should be cooked right in to the perturb function itself as logic that just processes the data its given.

Any reason for ascii? Could store them as npy array files or torch pt files.

nbren12 commented 11 months ago

i'm agnostic about the format, but this is a few hundred values so text seems ok, and text is easier to inspect. Might even be handy for quick reference.

w/e the format, I suggest one file that contains a table of channel names, means, stds.

NickGeneva commented 11 months ago

what ever is the most simple to load and isnt too confusing :)

Note: remember to add this to the MANIFEST.in otherwise the pip install will ignore it.

nbren12 commented 11 months ago

Resolved by #64