Open leighorf opened 1 year ago
This is a very reasonable request, @leighorf. That information is encoded, withOUT loss, in the datasets creation cd_values
as the ZFP stream's header.
I think its probably best to add a function to the library interface to H5Z-ZFP for this. It requires a combination of HDF5 and ZFP library calls.
In lieu of such a function, given an existing dataset id of dsid
, I think it is possible do something like...
hid_t cpid = H5Dget_create_plist(dsid);
unsigned int flags;
size_t nelemts = 10;
unsigned cd_vals[10];
H5Pget_filter_by_id2(cpid, H5Z_FILTER_ZFP, &flags, &nelemts, cd_vals, ...);
// cd_vals contains, starting at entry index 1, the ZFP stream header. So, now, open that as a bitstream...
bitstream *dummy_bstr = stream_open(&cd_vals[1], sizeof(cd_vals))));
zfp_stream *dummy_zstr = zfp_stream_open(dummy_bstr);
// now, query stream for info you seek...
zfp_mode zm = zfp_stream_compression_mode(dummy_zstr);
double rate = zfp_stream_rate(dummy_zstr, dim);
double accuracy = zfp_stream_accuracy(dummy_zstr);
uint precision = zfp_stream_precision(dummy_zstr);
zfp_stream_close(dummy_zstr);
stream_close(dummy_bstr);
@brtnfld and @leighorf I am about 1/2 way through having this completed. Maybe a little more than that.
I just realized, however, I don't fully understand all the context(s) in which retrieving ZFP encoding params would be needed. Here are some of the ways I am thinking...
int H5Z_zfp_get_mode(hid_t dsid);
double H5Z_zfp_get_accuracy(hid_t dsid);
double H5Z_zfp_get_rate(hid_t dsid);
int H5Z_zfp_get_precision(hid_t dsid);
int H5Z_zfp_get_reversible(hid_t dsid);
int H5Z_zfp_get_expert(hid_t dsid, unsigned int *minbits, unsigned int *maxbits, unsigned int *maxprec, int *minexp);
where all of the above return a negative value when requested parameter(s) not available.
cd_vals
data associated with the dataset header via something like H5Pget_filter_by_id2()
int H5Z_zfp_get_mode(int nvals, unsigned int *cd_vals);
double H5Z_zfp_get_accuracy(int nvals, unsigned int *cd_vals);
double H5Z_zfp_get_rate(int nvals, unsigned int *cd_vals);
int H5Z_zfp_get_precision(int nvals, unsigned int *cd_vals);
int H5Z_zfp_get_reversible(int nvals, unsigned int *cd_vals);
int H5Z_zfp_get_expert(int nvals, unsigned int *cd_vals, unsigned int *minbits, unsigned int *maxbits, unsigned int *maxprec, int *minexp);
h5ls
or h5dump
and the cd_vals
associated with a dataset are printed as in h5ls -vlrd | grep ZFP | decode_zfp_cdvals
I would suggest essentially duplicating the current zfp API for querying these parameters. It's probably not a good idea for the H5Z_zfp
functions to do this in a slightly different way.
Another possibility is to piggyback on the zfp_config
struct available as of zfp 1.0.0. Unfortunately, functions are currently missing for querying a config struct. This will be added to the next release.
You mean for querying an already compressed dataset?
I think if callers want to use ZFP library interface, then all we should provide is a means to obtain a zfp_stream*
object to use in those calls and they can just use them. In fact, that might be better way to go since they have to link to ZFP either way to get that information.
Related to this, I just realized yesterday that H5Z-ZFP mode
integers don't map 1:1 to ZFP's mode
enums. For example in H5Z-ZFP, mode
of 3
is accuracy mode whereas in ZFP its 4
.
You mean for querying an already compressed dataset?
Well, yes, but more generally getting a zfp_config
struct from a zfp_stream
. The C++ compressed-array class API allows you to set the compression parameters of a zfp_stream
by passing a zfp_config
, e.g., via const_array::set_config(const zfp_config &config)
, but the high-level C API currently lacks functions for setting/getting zfp_stream
parameters via zfp_config
.
I think if callers want to use ZFP library interface, then all we should provide is a means to obtain a
zfp_stream*
object to use in those calls and they can just use them. In fact, that might be better way to go since they have to link to ZFP either way to get that information.
True, that might be a more general approach. I don't know if there are any cases where you manipulate a zfp_stream
but H5Z-ZFP ignores those changes, which might result in unexpected results. The execution policy is one such setting. We should discuss how we want to support that and other zfp_stream
settings going forward.
Related to this, I just realized yesterday that H5Z-ZFP
mode
integers don't map 1:1 to ZFP'smode
enums. For example in H5Z-ZFP,mode
of3
is accuracy mode whereas in ZFP its4
.
I don't think there's much we can do about that now without breaking things.
It's probably not a good idea for the
H5Z_zfp
functions to do this in a slightly different way.
In this comment, were you basically speaking to how I proposed to handle the return values for error or n/a cases? If so, I agree.
It's probably not a good idea for the
H5Z_zfp
functions to do this in a slightly different way.In this comment, were you basically speaking to how I proposed to handle the return values for error or n/a cases? If so, I agree.
Right. The zfp library already has those same functions (with different names, of course), so it would make sense for H5Z-ZFP to just wrap those and use the same parameters and return values.
@leighorf I finally have a prototype implementation for this on branch feat-mcm86-04mar23-retrieve-zfp-params
and wonder if you could take a look.
You can see an example of how it works for a dataset already written to a file here..
If the caller knows nothing, it must first query for mode and then based on that, query for remaining params. If you know mode, you can avoid having to query twice. It is an error to query for zfp parameters that do not match the mode. So, if mode is accuracy but precision is queried, that will generate an error.
The caller is responsible for obtaining the desired dataset's creation property list id and passing that to H5Pget_zfp_XXX()
The implementation will handle any case...the property list is using bonified HDF5 properties, the property list is using generic properties before the dataset has been every been written, the dataset has been written.
@brtnfld I am just pinging you on this issue in case you wanted to have a look at the new functions I am working towards to retrieve ZFP compression parameters from a dataset's creation property list...
@markcmiller86 Just to make sure I understand how this is supposed to work, since the caller presumably does not already know what mode
is, you should call H5Pget_zfp
to first query the mode and then make a second call where you supply corresponding pointers to compression parameters?
As an alternative, zfp 1.0.0 supports zfp_config
, which would allow you to make a single call to get all this information. zfp_config
is not available pre 1.0.0, but it might be nice to have H5Pget_zfp_config()
as an alternative way of querying the mode and parameters when H5Z-ZFP is built with zfp 1.0.0.
In addition to querying compression parameter settings through the library, it would be nice to have a command-line tool that decodes cd_values
, i.e., that performs the inverse of what print_h5repack_farg
does.
Hello,
I have gone through great pains to carry fixed accuracy parameter metadata with all of my conversions of data that use ZFP. I often operate on ZFP compressed data and compress the results, and I want to make sure my final accuracy parameters are OK given the original accuracy parameters.
However it occurs to me that at least for a saved ZFP encoded HDF5 dataset, it should be possible to open a HDF5 file with ZFP compressed data and retrieve the original floating point representation of the accuracy parameter for each dataset (I know it is possible to do this with the zfp library). It is not evident how to do this with the H5Z-ZFP interface, but that is what I desire: The ability to retrieve the ZFP fixed accuracy parameter of a H5Z-ZFP compressed HDF5 dataset.