Open shaomeng opened 6 months ago
I belive the set_local
method is used only during an H5Dcreate()
call and is used as an optional opportunity to setup whatever stuff the filter may need depending on things like the dataset's data type class and/or size. For example, you can see what HDF5 library itself does in set_local
for SZIP filter...
https://github.com/HDFGroup/hdf5/blob/develop/src/H5Zszip.c#L114-L238
or in NBIT filter
https://github.com/HDFGroup/hdf5/blob/develop/src/H5Znbit.c#L749-L904
In H5Z-ZFP, we convert the parameters specified by the user (either via generic cd_values
or via properties interface) to the ZFP stream header and it is the ZFP stream header that gets stored as part of the dataset header. Thats because in the initial versions of the filter, we (or maybe it was just me...I don't think @lindstro cared too much) were worried that the stream header could dominate HDF5 chunk overheads. But, that concern turns out to be unrealistic because nobody tends to run with really tiny chunks (and they shouldn't either due to impact on I/O performance). So, in future versions of the filter, we may wind up just storing a separate ZFP stream header for each chunk.
So, set_local()
can be a no-op and for HDF5's built-in deflate filter, it is...
https://github.com/HDFGroup/hdf5/blob/develop/src/H5Zdeflate.c#L40-L42
we (or maybe it was just me...I don't think @lindstro cared too much) were worried that the stream header could dominate HDF5 chunk overheads.
I don't want to derail the discussion, but when H5Z-ZFP was first developed some 10 years ago, I was indeed concerned about keeping the zfp header short in case we wanted to store it per HDF5 chunk (anticipating small chunks). So we devised an efficient way of encoding (common) compression parameters (compression mode + rate/precision/accuracy) and array metadata (dimensions, scalar type) in a single 64-bit word. Even if this turned out not to be important for H5Z-ZFP in the end, I've always envisioned other applications where you want to spatially adapt compression settings (e.g., to keep high accuracy only around features of interest), and our compact metadata encoding allows you to do that.
I don't think @lindstro cared too much...
Sorry about that wording. What I was trying to say is that you probably already had figured out that the HDF5 chunks would have to have been mighty small before the ZFP stream header would become an issue...that isn't something I actually sat down to calculate until after I had already coded that aspect of the filter.
I didn't realize that the cd_values[]
are stored per dataset instead of per chunk, and also I better understand how it's used in the case of H5Z-ZFP. I really appreciate the discussion!
As a fellow HDF5 plugin developer, I'd like a little more information on what the
set_local
function does, and what purposes it serves. I'm confused because anyone invoking the plugin is already specifying compression parameters during theH5Pset_filter()
call, so it seems to me that theset_local()
function doesn't add any extra value.I do notice that this page says that
cd_values[]
passed in duringH5Pset_filter()
is modified. But the HDF5 document specifies that theset_local()
function receives a private copy of the dataset creation property list and does modification on it. Then what effect does the modification have if it's applied on a private copy?I appreciate any discussion on this topic!