Closed texadactyl closed 2 years ago
Huh, is there no way to tell whether a file uses the bitshuffle filter if you are just, say, opening it with the python h5py library? It seems like the code would have to know, in order to read it. I can't figure out how to do it though....
Under the covers of the C-library, they know and the data is inflated. There are C-level calls to find out for sure.
Come the rawspec revolution, do this:
import sys
import hdf5plugin
import h5py
if __name__ == "__main__":
n = len(sys.argv)
if n != 2:
print("Usage: {} <FBH5 File>".format(sys.argv[0]))
sys.exit(86)
inpath = sys.argv[1]
h5file = h5py.File(inpath, "r")
print("Rawspec version:", h5file.attrs["VERSION_RAWSPEC"].decode('utf-8'))
print("Librawspec version:", h5file.attrs["VERSION_LIBRAWSPEC"].decode('utf-8'))
print("cuFFT version:", h5file.attrs["VERSION_CUFFT"].decode('utf-8'))
print("HDF version:", h5file.attrs["VERSION_HDF"].decode('utf-8'))
print("Bitshuffle:", h5file.attrs["BITSHUFFLE"].decode('utf-8'))
Start of h5dump looks like this:
HDF5 "blc13_guppi_57991_49836_DIAG_FRB121102_0010.rawspec.0000.h5" {
GROUP "/" {
ATTRIBUTE "BITSHUFFLE" {
DATATYPE H5T_STRING {
STRSIZE 7;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "ENABLED"
}
}
ATTRIBUTE "CLASS" {
DATATYPE H5T_STRING {
STRSIZE 10;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "FILTERBANK"
}
}
ATTRIBUTE "VERSION" {
DATATYPE H5T_STRING {
STRSIZE 3;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "2.0"
}
}
ATTRIBUTE "VERSION_CUFFT" {
DATATYPE H5T_STRING {
STRSIZE 10;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "10.2.1.245"
}
}
ATTRIBUTE "VERSION_HDF" {
DATATYPE H5T_STRING {
STRSIZE 6;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "1.8.16"
}
}
ATTRIBUTE "VERSION_LIBRAWSPEC" {
DATATYPE H5T_STRING {
STRSIZE 23;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "2.5.0+88@g8770e76-dirty"
}
}
ATTRIBUTE "VERSION_RAWSPEC" {
DATATYPE H5T_STRING {
STRSIZE 23;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "2.5.0+88@g8770e76-dirty"
}
}
Huh, is there no way to tell whether a file uses the bitshuffle filter if you are just, say, opening it with the python h5py library?
Reading a compressed dataset is transparent unless you don't have the required decompression filters installed, in which case it will fail spectacularly. I imagine one could open the HDF5 file and then query properties of the datasets to find out whether they are compressed, but I don't know the details of how one would do that. From the command line, you can use h5dump
to see whether any "filters" were used:
$ h5dump -H -A 0 -p -d data guppi_59385_58426_TIC387260717_0000.rawspec.0000.h5
HDF5 "guppi_59385_58426_TIC387260717_0000.rawspec.0000.h5" {
DATASET "data" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 16, 1, 67108864 ) / ( H5S_UNLIMITED, 1, 67108864 ) }
STORAGE_LAYOUT {
CHUNKED ( 1, 1, 67108864 )
SIZE 3168789149 (1.355:1 COMPRESSION)
}
FILTERS {
USER_DEFINED_FILTER {
FILTER_ID 32008
COMMENT bitshuffle; see https://github.com/kiyo-masui/bitshuffle
PARAMS { 0 3 4 0 2 }
}
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE 0
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
In the output .h5 file, add file-level attributes for software versions:
In addition, show whether or not bitshuffle is enabled.