google-research / neuralgcm

Hybrid ML + physics model of the Earth's atmosphere
https://neuralgcm.readthedocs.io
Apache License 2.0
704 stars 82 forks source link

Too large RMSE scores #82

Open tobiasselz opened 5 months ago

tobiasselz commented 5 months ago

Hello!

Thanks for publishing the project! It was easy to run the model and to get decent looking forecasts. With several test runs I was however unable to reproduce the good scores that are published on WeatherBench2 or in your paper. For example, if I compute the 500hPa geopotential RMSE of the 0.7deg deterministic model version with the current inference_demo.ipynb script by adding the following lines at the end

sqdiff = (combined_ds.geopotential.sel(level=500, model='NeuralGCM')-combined_ds.geopotential.sel(level=500, model='ERA5'))**2
weight = np.cos(np.deg2rad(combined_ds.latitude))
rmse = (sqdiff*weight).mean(['longitude', 'latitude'])**0.5

I get a RMSE of 292.47699012 m^2/s^2 at 72 hours lead time which is way larger than the average 116 m^2/s^2 on WeatherBench. Also for other dates and lead times the RMSE I get is significantly larger than on WeatherBench or than the IFS forecast. I wonder if there is an error on my side or if there is an issue with the model.

Thank you, Tobias

shoyer commented 5 months ago

Thanks for asking! We should definitely add a quick error check to the end of our demo notebook to make it easier to spot check a location installation.

There is a small error in your evaluation code: it use normalized the weights, rather than only multiplying by cos(lat), i.e.,

sqdiff = (combined_ds.geopotential.sel(level=500, model='NeuralGCM')-combined_ds.geopotential.sel(level=500, model='ERA5'))**2
weight = np.cos(np.deg2rad(combined_ds.latitude))
rmse2 = sqdiff.weighted(weight).mean(['longitude', 'latitude'])**0.5

When I add this output to a cell at the end of the demo notebook and run it in Google Colab using (using an L4 GPU), I see the following for our 0.7 degree model:

>>> print(rmse2)
<xarray.DataArray 'geopotential' (time: 4)>
array([144.24580492,  62.31808467,  85.08943717, 120.11783857])
Coordinates:
    level    int64 500
  * time     (time) int64 0 24 48 72

An RMSE score of 120 m^2/s^2 at 72 hours is roughly what I would expect for this model.

So something seems to be inconsistent on your end (vs Google Colab). Happy to help debug if you get any specific clues about what is going wrong.

tobiasselz commented 5 months ago

Thanks, that helped a lot! You are right, I missed a factor sqrt(pi/2), but the problem remains. I will try to figure out what's going wrong and maybe get back to you.