LDMX-Software / fire

Event-by-event processing framework using HDF5 and C++17
https://ldmx-software.github.io/fire/
GNU General Public License v3.0
1 stars 0 forks source link

Boolean Serialization #10

Closed tomeichlersmith closed 2 years ago

tomeichlersmith commented 2 years ago

Currently, we are just serializing bools into shorts. This is not a very satisfactory solution, especially the data copying necessary to get around the vector sepcialization in C++.

The solution is to implement a bool<->enum mapping and serialize the enum. This has already been done by h5py and would mean that opening a boolean dataset in h5py would work 'out of the box'.

tomeichlersmith commented 2 years ago

https://github.com/BlueBrain/HighFive/blob/04793312009943e46a917ed113446b4ab1ab5379/tests/unit/tests_high_five_base.cpp#L1607-L1666

tomeichlersmith commented 2 years ago

Using h5dmp, I was able to deduce the H5Type that h5py uses for serlizing bools:

      DATATYPE  H5T_ENUM {
         H5T_STD_I8LE;
         "FALSE"            0;
         "TRUE"             1;
      }

In C++ land, this is

enum class BOOL : signed char {
  FALSE = 0,
  TRUE = 1
};

I was able to test this by writing out bools-py.h5 with write-bools.py (below) and bools-cpp.h5 with the executable compiled from write-bools.cxx (below). And then reading both of them with read-bools.py (below). Both H5 files written by C++ or Python were read in seamlessly by h5py and interpreted into Python bools.

Now I just need to figure out how to put this enum type into the type deduction tree that is currently in fire.

write-bools.py

import h5py
import numpy as np
with h5py.File('bools-py.h5','w') as f :
    dset = f.create_dataset('mybools',(10,),dtype=bool)
    dset[::] = np.full((10),True)

write-bools.cxx

Compile with h5c++ to avoid extra linking parameters. Done in hdf5 container so HighFive is installed in system path.


#include <highfive/H5File.hpp>
using namespace HighFive;

enum class BOOL : signed char {
  FALSE = 0,
  TRUE  = 1
};

EnumType<BOOL> create_enum_bool() {
  return {{"FALSE", BOOL::FALSE},{"TRUE", BOOL::TRUE}};
}
HIGHFIVE_REGISTER_TYPE(BOOL, create_enum_bool)

int main() try {
  File f("bools-cpp.h5", File::ReadWrite | File::Create | File::Truncate);

  std::vector<BOOL> data = {BOOL::TRUE, BOOL::TRUE, BOOL::TRUE};
  std::cout << data.size() << std::endl;
  auto dset = f.createDataSet("mybools", DataSpace(data.size()), create_enum_bool());
  dset.write(data);

  f.flush();
  return 0;
} catch (const Exception& e) {
  std::cerr << " [H5 Error] : " << e.what() << std::endl;
  return 1;
}

read-bools.py

import h5py
import sys
with h5py.File(sys.argv[1]) as f :
    print(f['mybools'][...])