google-research / arco-era5

Recipes for reproducing Analysis-Ready & Cloud Optimized (ARCO) ERA5 datasets.
https://cloud.google.com/storage/docs/public-datasets/era5
Apache License 2.0
287 stars 22 forks source link

Store data in uint16 #66

Open loliverhennigh opened 9 months ago

loliverhennigh commented 9 months ago

Hey,

Totally amazing project. Wanted to ask about compression/uint16 storage. When I get lat/lon ERA5 data from the CDS api I see the data is compressed and stored in uint16 along with scale factors to convert back to float32. When I look at the dataset here it seems to be in the full float32. Any reason not to store in the original uint16? Saves tremendous amounts of bandwidth and when using ERA5 in ML workflows this can make a big difference.

shoyer commented 3 months ago

This is definitely worth investigating. I know that we didn't do this for some 3D fields because the scale/offset factors that ECMWF uses differ for the same variable at different vertical levels in the atmosphere.