Deltares / hydromt_wflow

Wflow plugin for HydroMT
https://deltares.github.io/hydromt_wflow/
GNU General Public License v3.0
16 stars 13 forks source link

Round data in staticmaps.nc #231

Open hboisgon opened 7 months ago

hboisgon commented 7 months ago

Kind of request

Changing existing functionality

Enhancement Description

We have the request for staticgeoms but I think it would be good practice to round all grids in staticmaps.nc. The number of decimals does not make sense. Eg

image

Use case

It would produce less big file for staticmaps.nc. Not sure if it would have any impact on computation speed?

Additional Context

No response

Huite commented 7 months ago

Hi @hboisgon,

I saw an issue like this come up in my mentions earlier, for Wflow.jl: https://github.com/Deltares/Wflow.jl/issues/314

Like I mention there: you generally don't want to round binary numbers. A float32 will always take 32 bits of memory, and a float64 will take 64 bits of memory. You might get smaller files if you turn compression on, and rounding might help a little since you are reducing the information content (so the compression algorithm will be able to find more redundancy), but you need to turn on compression in either case.

But if you're looking to reduce file sizes, I recommend investigating compression instead. NetCDF4 only supports zlib compression; e.g. Zarr uses Blosc for far more performant compression.

With regards to the physical interpretation: if you want to add that, you should probably try adding metadata instead. You could argue that a river width is never more accurate than 1 cm (for example), but doesn't generalize: e.g. if you're doing computational/numerical experiments.

And in that case you should do error propagation proper! That's stuff like this: https://github.com/JuliaPhysics/Measurements.jl https://pythonhosted.org/uncertainties/

And then ideally support it in an xarray package like pint does: https://xarray.dev/blog/introducing-pint-xarray

shartgring commented 3 months ago

This may also relate to https://docs.xarray.dev/en/latest/user-guide/io.html#writing-encoded-data. I read online (https://github.com/pydata/xarray/issues/865 and https://github.com/pydata/xarray/issues/1572) that lossy compression is possible and may go hand in hand with with rounding the data, as accuracy is guaranteed for a certain number of digits, I guess similar to this: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html

I am not sure how this would work with zlib, if it is either lossy vs lossless, or that a combination can be used?

Huite commented 3 months ago

It looks a bit like a breadcrumbs trail to be honest, as xarray doesn't just provide an overview -- which is reasonable, since it depends on what's available in the netCDF4 / HDF5 binaries.

The relevant netCDF4-python docs: https://unidata.github.io/netcdf4-python/#efficient-compression-of-netcdf-variables:

zlib compression is always available, szip is available if the linked HDF5 library supports it, and zstd, bzip2, blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are available via optional external plugins.

For hydromt, you can safely assume that the binary origin is conda-forge so whatever plugins are compiled there are relevant.

More info is probably only available on the netCDF docs directly, among them quantizing: https://docs.unidata.ucar.edu/netcdf-c/current/md__media_psf_Home_Desktop_netcdf_releases_v4_9_2_release_netcdf_c_docs_quantize.html

Pretty confident that zlib is lossless.

Best approach IMO is to setup a new pixi env, see which schemes work, and make some examples. Would be useful documentation anyway!