USGS-CMG / stglib

Routines used by the USGS Coastal/Marine Hazards & Resources Program to process oceanographic time-series data
Other
19 stars 15 forks source link

Add compression to encoding #239

Open dnowacki-usgs opened 5 months ago

dnowacki-usgs commented 5 months ago

We can add lossless compression to the encoding dict so that the netCDF files are smaller. I think this makes sense to enable across the board...?

dnowacki-usgs commented 3 months ago

Except that, according to the xarray docs:

Chunk based gzip compression can yield impressive space savings, especially for sparse data, but it comes with significant performance overhead. HDF5 libraries can only read complete chunks back into memory, and maximum decompression speed is in the range of 50-100 MB/s. Worse, HDF5’s compression and decompression currently cannot be parallelized with dask. For these reasons, we recommend trying discretization based compression (described above) first.