BlueBrain / HighFive

HighFive - Header-only C++ HDF5 interface
https://bluebrain.github.io/HighFive/
Boost Software License 1.0
696 stars 162 forks source link

Unable to write an array of compound types with `std::string` #488

Open kerim371 opened 2 years ago

kerim371 commented 2 years ago

Describe the bug I have a compound that includes std::string. Then I create an array (std::vector<CompoundTypeWithString>) and try to write. Then HDF5 gets segmentation fault:

Program received signal SIGSEGV, Segmentation fault.
0x00005555558bd344 in H5T__vlen_mem_str_getlen (file=<optimized out>, _vl=0x7ffff78e6038, len=0x7fffffffcf28) at .../HDF5/src/H5Tvlen.c:617
617     *len = HDstrlen(s);

To Reproduce Here is the example code to reproduce the problem (if you remove the second element from the array then there is no problem to run the executable):

#include <highfive/H5File.hpp>
#include <highfive/H5DataSet.hpp>
#include <highfive/H5DataSpace.hpp>
#include <highfive/H5DataType.hpp>

using namespace HighFive;

typedef struct {
    double x;
    double y;
    double z;
    std::string name;
} CT;

CompoundType create_compound_CT() {
    CompoundType t(
        {
            {"x", AtomicType<double>{}},
            {"y", AtomicType<double>{}},
            {"z", AtomicType<double>{}},
            {"name", AtomicType<std::string>{}}
        });
    return t;
}

HIGHFIVE_REGISTER_TYPE(CT, create_compound_CT)

int main(int, char**) {
    File file("compound.h5", File::ReadWrite | File::Create | File::Truncate);

    CompoundType t = create_compound_CT();
    t.commit(file, "CT");

    std::vector<CT> data = {
      {1, 1, 1, "one"},
      {2, 2, 2, "two"}  // if you leave only one element in array then the run process will end up successively
    };

    auto dataset = file.createDataSet("data", DataSpace::From(data), t);
    dataset.write(data);  // there I get seg fault

    return 0;
}

Expected behavior I expect that embedding and writing std::string is possible when using it within compound type.

Desktop (please complete the following information):

pramodk commented 2 years ago

May be @ferdonline might remember about the string type and compound data type compatibility issues?

kerim371 commented 2 years ago

@pramodk thank you for response,

I think I somehow found a way to partially overcome this. The idea is to keep const char * pointing to the std::string member variable and when IO using HDF5 we explicitely specify ofssets using HOFFSET to the const char *.

The problem arises when after reading the data we need to free allocated variable lengh string memory using HDF5 H5Treclaim() command. I have to do that manually while the HighFive uses this command within data_converter() when we read in std::string and std::vector<std::string>. Thus HighFive prevents us from memory leaks when reading to std::string.

I guess it is possible to add data_converter that works with compound types like Compound and std::vector<Compound> and checks whether the compound has variable length string members (using C HDF5 API) and correctly calculates offsets for them. But that would require some work (especially calculating offsets and copying strings from a temporary to the input variable), it is not so easy (or probably impossible).

typedef struct Point{
  Point() {};
  Point(const double& x,
        const double& y,
        const double& z)
  {
    this->p[0] = x;
    this->p[1] = y;
    this->p[2] = z;;
  }

  void setX(const double& x) { p[0] = x; }
  void setY(const double& y) { p[1] = y; }
  void setZ(const double& z) { p[2] = z; }

  double& x() { return p[0]; }
  double& y() { return p[1]; }
  double& z() { return p[2]; }

  void setName(const std::string& name) {
    this->name = name;
    this->cname = this->name.c_str();
  }
  std::string getName() {
    if (this->cname == nullptr)
      return std::string();
    return std::string(this->cname);
  }

  double p[3];

private:
  std::string name;
  const char *cname = name.c_str();

  friend h5gt::CompoundType compound_Point();
} Point;

inline CompoundType compound_Point() {
  CompoundType t(
        {
          {"x", AtomicType<double>{}, HOFFSET(Point, p[0])},
          {"y", AtomicType<double>{}, HOFFSET(Point, p[1])},
          {"z", AtomicType<double>{}, HOFFSET(Point, p[2])},
          {"name", AtomicType<std::string>{}, HOFFSET(Point, cname)},
        }, sizeof(Point));

  return t;
}

As you can see when you setName() you copy string to the string member variable and set const char * to it. Read/write operations involve const char * var.