NVIDIA / earth2mip

Earth-2 Model Intercomparison Project (MIP) is a python framework that enables climate researchers and scientists to inter-compare AI models for weather and climate.
https://nvidia.github.io/earth2mip/
Apache License 2.0
183 stars 40 forks source link

Improve lagged ensemble performance and versatility #154

Closed nbren12 closed 7 months ago

nbren12 commented 8 months ago

Earth-2 MIP Pull Request

Description

I have been trying to score graphcast, but because it is big, I was first saving the forecasts to hindcast directory using earht2mip.time_collection instead of scoring it online with earth2mip.lagged_ensembles. Unfortunately, there is some bug such that at the lead time =0 forecast is not the same as the initial condition.

This error is not present in the lagged ensemble script so I would like to use it to score graphcast directly. The lagged ensemble script currently does not handle the grid information properly and the scoring is very slow. This PR addresses both issues.

I add a memory and compute efficient implementation of CRPS in torch. The code relies on xskillscore/properscoring, but when run on the GPU this used the O(n^2) kernel method and runs out of memory when there are many channels. I did not use the modulus version of CRPS because it lacks an efficient yet exact implementation of CRPS. The kernel method is inefficient and the binned method is inexact. A bonus is that I can remove the xskillscores/properscoring dep. This code could be upstreamed to modulus core.

Checklist

Dependencies

nbren12 commented 8 months ago

/blossom-ci

nbren12 commented 8 months ago

@dallasfoster Thanks for the discussion. I feel it is good knowledge transfer, but perhaps a bit out of scope. Not sure if you have seen these docs: https://nvidia.github.io/earth2mip/userguide/concepts.html#forecast. If not, I highly recommend reading them to understand the current design of e2mip. This PR is however not the place to revisit all aspects of the design.

A few comments, some of which may turn into issues if desired. I think there is in this PR a pattern of code isolation that should be addressed in order to make the repo more modular.

Yes, this PR is focused on fixing some bugs and optimizing the lagged ensemble script, and some of the pieces may be useful in general, but that is something we can address later I think.

Let's upstream the crps code to modulus. I am okay with duplicating a 1-line helper for weighted averaging a few places in the code.

NickGeneva commented 8 months ago

@nbren12 looks in general fine to me, although right now I'm not fully familiar with the lagged ensembles. But changes outside seem good enough to get things fixed / discuss more later.

nbren12 commented 8 months ago

/blossom-ci

nbren12 commented 8 months ago

/blossom-ci

nbren12 commented 8 months ago

/blossom-ci

nbren12 commented 8 months ago

/blossom-ci

NickGeneva commented 8 months ago

/blossom-ci

nbren12 commented 8 months ago

/blossom-ci

nbren12 commented 7 months ago

/blossom-ci

nbren12 commented 7 months ago

/blossom-ci

nbren12 commented 7 months ago

/blossom-ci

nbren12 commented 7 months ago

/blossom-ci

nbren12 commented 7 months ago

/blossom-ci

nbren12 commented 7 months ago

The tests are failing when trying to use pytest-regtest: https://github.com/NVIDIA/earth2mip/actions/runs/7493428670/job/20399108089#step:2:1160

I'm not sure why, since the logs show it is being installed.

@NickGeneva Do you have any ideas about what the issue is?

NickGeneva commented 7 months ago

The tests are failing when trying to use pytest-regtest: https://github.com/NVIDIA/earth2mip/actions/runs/7493428670/job/20399108089#step:2:1160

I'm not sure why, since the logs show it is being installed.

@NickGeneva Do you have any ideas about what the issue is?

Not sure, when running locally I had to remove pytest-regtest from the TOML to get pytest to function at all. May need to force a lower version to like 1.5.1. Seems other are having similar issues with the latest release: https://gitlab.com/uweschmitt/pytest-regtest/-/issues/20

nbren12 commented 7 months ago

/blossom-ci

nbren12 commented 7 months ago

Thanks @NickGeneva for the tip, that fixed it. Merging now.