UCBerkeleySETI / rawspec

6 stars 7 forks source link

Add file level attributes for software versions #49

Closed texadactyl closed 2 years ago

texadactyl commented 2 years ago

In the output .h5 file, add file-level attributes for software versions:

In addition, show whether or not bitshuffle is enabled.

lacker commented 2 years ago

Huh, is there no way to tell whether a file uses the bitshuffle filter if you are just, say, opening it with the python h5py library? It seems like the code would have to know, in order to read it. I can't figure out how to do it though....

texadactyl commented 2 years ago

Under the covers of the C-library, they know and the data is inflated. There are C-level calls to find out for sure.

Come the rawspec revolution, do this:

import sys
import hdf5plugin
import h5py

if __name__ == "__main__":
    n = len(sys.argv)
    if n != 2:
        print("Usage:  {}  <FBH5 File>".format(sys.argv[0]))
        sys.exit(86)
    inpath = sys.argv[1]
    h5file = h5py.File(inpath, "r")
    print("Rawspec version:", h5file.attrs["VERSION_RAWSPEC"].decode('utf-8'))
    print("Librawspec version:", h5file.attrs["VERSION_LIBRAWSPEC"].decode('utf-8'))
    print("cuFFT version:", h5file.attrs["VERSION_CUFFT"].decode('utf-8'))
    print("HDF version:", h5file.attrs["VERSION_HDF"].decode('utf-8'))
    print("Bitshuffle:", h5file.attrs["BITSHUFFLE"].decode('utf-8'))
texadactyl commented 2 years ago

Start of h5dump looks like this:

HDF5 "blc13_guppi_57991_49836_DIAG_FRB121102_0010.rawspec.0000.h5" {
GROUP "/" {
   ATTRIBUTE "BITSHUFFLE" {
      DATATYPE  H5T_STRING {
         STRSIZE 7;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "ENABLED"
      }
   }
   ATTRIBUTE "CLASS" {
      DATATYPE  H5T_STRING {
         STRSIZE 10;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "FILTERBANK"
      }
   }
   ATTRIBUTE "VERSION" {
      DATATYPE  H5T_STRING {
         STRSIZE 3;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2.0"
      }
   }
   ATTRIBUTE "VERSION_CUFFT" {
      DATATYPE  H5T_STRING {
         STRSIZE 10;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "10.2.1.245"
      }
   }
   ATTRIBUTE "VERSION_HDF" {
      DATATYPE  H5T_STRING {
         STRSIZE 6;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "1.8.16"
      }
   }
   ATTRIBUTE "VERSION_LIBRAWSPEC" {
      DATATYPE  H5T_STRING {
         STRSIZE 23;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2.5.0+88@g8770e76-dirty"
      }
   }
   ATTRIBUTE "VERSION_RAWSPEC" {
      DATATYPE  H5T_STRING {
         STRSIZE 23;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "2.5.0+88@g8770e76-dirty"
      }
   }
david-macmahon commented 2 years ago

Huh, is there no way to tell whether a file uses the bitshuffle filter if you are just, say, opening it with the python h5py library?

Reading a compressed dataset is transparent unless you don't have the required decompression filters installed, in which case it will fail spectacularly. I imagine one could open the HDF5 file and then query properties of the datasets to find out whether they are compressed, but I don't know the details of how one would do that. From the command line, you can use h5dump to see whether any "filters" were used:

$ h5dump -H -A 0 -p -d data guppi_59385_58426_TIC387260717_0000.rawspec.0000.h5
HDF5 "guppi_59385_58426_TIC387260717_0000.rawspec.0000.h5" {
DATASET "data" {
   DATATYPE  H5T_IEEE_F32LE
   DATASPACE  SIMPLE { ( 16, 1, 67108864 ) / ( H5S_UNLIMITED, 1, 67108864 ) }
   STORAGE_LAYOUT {
      CHUNKED ( 1, 1, 67108864 )
      SIZE 3168789149 (1.355:1 COMPRESSION)
   }
   FILTERS {
      USER_DEFINED_FILTER {
         FILTER_ID 32008
         COMMENT bitshuffle; see https://github.com/kiyo-masui/bitshuffle
         PARAMS { 0 3 4 0 2 }
      }
   }
   FILLVALUE {
      FILL_TIME H5D_FILL_TIME_IFSET
      VALUE  0
   }
   ALLOCATION_TIME {
      H5D_ALLOC_TIME_INCR
   }
}
}