ecmwf / climetlab

Python package for easy access to weather and climate data
Apache License 2.0
374 stars 57 forks source link

add support for loading single-precision data from GRIB files #40

Closed mishooax closed 2 years ago

mishooax commented 2 years ago

@floriankrb - hi Florian, it'd be great if we could load data from GRIB files (like ERA5 / WeatherBench) in single precision (float32) instead of np.float64, which appears to be the current default. Reading float64s and converting to float32 for ML seems rather wasteful. I'd be happy to assist with this if possible. Cheers, ~M

floriankrb commented 2 years ago

TLDR: yes, we will add this as an option in .to_tfdataset() or .to_pytorch(), or make it the default.

Thanks for the discussion about this. I am adding here a short summary before closing this. Data is not stored in GRIB files as float64, it is closer to a float10 or float16 (depending on which data, I think, temperature needs less precision than chemical properties). This is an ongoing discussion on how best compress the data inside a GRIB. When reading the data, if some computation is performed (such as interpolation or other), using float64 can be a requirement. The actual size in memory is not a major issue. Moreover, changing to float 32 here would lead to major developments in the C library, this is not planed until proven useful. On the other hand, turning float64 to float32 on the fly (i.e. as the data is read from GRIB) is fast and easy, it will be implemented into climetlab. Generally speaking, using less memory to store the data is a good idea, and there are several places where float64 is wasteful. Let's optimize only the places where there is a proven bottleneck.