LLNL / H5Z-ZFP

A registered ZFP compression plugin for HDF5
Other
50 stars 22 forks source link

Document how to use H5Z-ZFP with netCDF-4 #146

Open lindstro opened 1 month ago

lindstro commented 1 month ago

It would be nice to have documentation on how to use H5Z-ZFP in the context of netCDF-4. I've spent the last couple of days struggling with making this work, and I believe some additional documentation could save a lot of people grief.

First, the netCDF filter documentation mentions that HDF5 filters can indeed be used, e.g., from command-line tools like nccopy with the -F switch (similar to but not the same as the h5repack -f switch) There are a few things the documentation does not mention, however:

It seems that you cannot nccopy a file that has already been compressed, say, using zlib, to another compressed format. nccopy will simply silently ignore such requests and not use the requested compression filter. You first have to use -F none to copy the file to a temporary intermediate uncompressed file. And ncdump -h will not tell you whether or not the file has been compressed. For that, you need to use the -s switch also, e.g., ncdump -hs file.nc.

The netCDF filter parameters are similar to yet distinct from how they're fed to h5repack. As a concrete example, suppose we want to use H5Z-ZFP in fixed-accuracy mode with a tolerance of 1.0. This would be specified to h5repack using

-f UD=32013,0,4,3,0,0,1072693248

where these numbers mean

32013: filter ID (zfp compression)
0: unused (h5repack only)
4: number of 32-bit unsigned integer compression parameters (cd_values) that follow (h5repack only)
3: zfp fixed-accuracy mode
0: unused
0,1072683248: two 32-bit unsigned integers representing a type-punned double-precision tolerance of 1.0 in little-endian order

With nccopy, you don't need the 0 following the filter ID, nor do you specify the number of cd_values. Rather, you would provide this:

-F varname,32013,3,0,0,1072693248

You have to tell nccopy the name of the variable (dataset) you want to apply the filter to. You can also specify * for varname to apply compression to all variables, though I believe H5Z-ZFP will fail on certain types, e.g., chars. After the filter ID, you specify only the actual cd_values given to h5repack.

One nice thing about nccopy is that it understands how to do type punning. The above example could also be specified as

-F varname,32013,3,0,1.0d

Here 1.0d is interpreted as a double-precision number. This works fine on little-endian machines; my reading of the netCDF documentation is that this would not work correctly on a big-endian machine, but who has one of those these days?

Perhaps a short section "Using H5Z-ZFP Plugin with nccopy" can be added to the documentation? Maybe even a separate netCDF tool like print_h5repack_farg can be provided, or have print_h5repack_farg print both h5repack and nccopy arguments.

markcmiller86 commented 1 month ago

Too bad you didn't find this, https://github.com/LLNL/H5Z-ZFP/issues/143

lindstro commented 1 month ago

I guess I had already forgotten as I did participate on that thread. :-) But #143 deals only with how to support compression programmatically--it says nothing about how to use the CLI tools, which I suspect is the more common use case. For instance, how often do you call zlib vs. compress a file using gzip?