NOAA-ORR-ERD / LibGOODS

Library for accessing data useful for the NOAA / GNOME model
https://libgoods.readthedocs.io/en/latest/
Other
1 stars 2 forks source link

optimal netcdf storage #18

Closed ChrisBarker-NOAA closed 2 years ago

ChrisBarker-NOAA commented 2 years ago

This is here as a reminder:

We should make sure that the netcdf files created are reasonably optimal for use with GNOME. That means:

lukecampbell commented 2 years ago

This is complete with the latest available versions of model_catalogs and LibGOODS

lukecampbell commented 2 years ago

The chunksizes are for the most part intelligently chosen, example from TBOFS:

output git:(rotating) ncdump -h TBOFS_nowcast_20220823-20220824.nc | grep _Chunk
                zeta:_ChunkSizes = 1, 290, 176 ;
                ocean_time:_ChunkSizes = 512 ;
                wetdry_mask_psi:_ChunkSizes = 1, 289, 175 ;
                wetdry_mask_rho:_ChunkSizes = 1, 290, 176 ;
                wetdry_mask_u:_ChunkSizes = 1, 290, 175 ;
                wetdry_mask_v:_ChunkSizes = 1, 289, 176 ;
                u:_ChunkSizes = 1, 11, 290, 175 ;
                v:_ChunkSizes = 1, 11, 289, 176 ;
                temp:_ChunkSizes = 1, 11, 290, 176 ;
                salt:_ChunkSizes = 1, 11, 290, 176 ;
                Uwind:_ChunkSizes = 1, 290, 176 ;
                Vwind:_ChunkSizes = 1, 290, 176 ;

Dimension order is correct, for the most part. The dimension order is specified by the provider, and so far have been correct (t, z, y, x).

Compression: this is could be a PhD thesis level discussion. We don't currently support lossy compression schemes, and several of the newer fancier compression options are only available to netCDF clients with a relatively new version of HDF5 and netCDF4 libraries. xarray supports compression options.

lukecampbell commented 2 years ago

I'm trying to find the default compression settings, but there may be a follow-up ticket to this if explicit lossless compression schemes are desired and supported by clients.

ChrisBarker-NOAA commented 2 years ago

In our case, we want the "best" lossless compression that is supported by the netCDF4 package as delivered by conda-forge -- I have no idea what those options might be.

but:

https://unidata.github.io/netcdf4-python/#efficient-compression-of-netcdf-variables

Indicates that zlib compression is always available, so maybe use that in any case?

We also could consider truncating some of the data for better compression, if that makes sense for any of our variables -- that would take some thought -- probably for another day.