BlueBrain / HighFive

HighFive - Header-only C++ HDF5 interface
https://bluebrain.github.io/HighFive/
Boost Software License 1.0
673 stars 159 forks source link

Potential bug #1008

Closed artiomn closed 3 months ago

artiomn commented 3 months ago

Describe the bug

Class HighFive::File is inherited from HighFive::Object: https://github.com/BlueBrain/HighFive/blob/master/include/highfive/H5File.hpp#L23

But the HighFive::Object class has a non-virtual destructor, which can lead to unexpected behavior: https://github.com/BlueBrain/HighFive/blob/master/include/highfive/H5Object.hpp#L60

Version Information

1uc commented 3 months ago

I'm not sure I'm following. A virtual destructor is needed when deleting through a base pointer, e.g.:

void delete_via_base_ptr(Base * base) {
  delete base;
}

Derived * derived = new Derived();
delete_via_base_ptr(derived);

This is often desirable for polymorphic classes. However, neither File nor Object have any virtual classes. Meaning the above example is not something one would implement for File and Object.

Would you mind providing an example explaining the unexpected behaviour your alluding to?

artiomn commented 3 months ago

Yes, this is needed when classes will be deleted at the base pointer. But I'm not saying that this is 100% a bug in your code.

This is worse. Because some other objects like groups, datasets, etc. inherit from Object.

And this design tells the developer who is using your library: "Hey, you can use base class pointers to store and manipulate objects."

But if he uses the library in this way, the reference count will not be decremented...

1uc commented 3 months ago

And this design tells the developer who is using your library: "Hey, you can use base class pointers to store and manipulate objects."

It doesn't seem particularly tempting to treat all HighFive objects as HighFive::Object*, its interface is extremely limited almost to the point of being useless.

We can try to turn it into a compiler error by making the dtor protected.

artiomn commented 3 months ago

We can try to turn it into a compiler error by making the dtor protected.

I think this is not a bad idea. Thank you.

P.S.:

But my problem is not here, I saw this when I was trying to fix the bug: "Test : SaveLoadDataSuite.Hdf5Test ..Bus error***Exception: 0.17 sec"

Two HDF5 save/load tests failed only on CI (gcc-13 Docker image, based on Debian Buster with additional deps). Locally (Manjaro Linux) all tests passed. JSON tests save/load tests passed and even SONATA (format used HDF5) tests passed...

Test code:

class SaveLoadDataSuite : public ::testing::Test
{
protected:
    void SetUp() override { messages_ = generate_random_messages(uid_, 200, 20, 0.2); }

    void TearDown() override { std::filesystem::remove(file_path_); }

    std::vector<knp::core::messaging::SpikeMessage> messages_;
    std::filesystem::path file_path_;
    knp::core::UID uid_;
};

TEST_F(SaveLoadDataSuite, Hdf5Test)
{
    file_path_ = "data.h5";
    knp::framework::storage::native::save_messages_to_h5(messages_, file_path_);
    ASSERT_EQ(messages_, knp::framework::storage::native::load_messages_from_h5(file_path_, uid_));
}

In the functions code like these:

std::vector<core::messaging::SpikeMessage> load_messages_from_h5(
    const fs::path &path_to_h5, const knp::core::UID &uid, float time_per_step, bool strict_format)
{
    HighFive::File h5_file(path_to_h5.string());
    // Do smth. with groups and objects.
   ...
    const auto node_dataset = data_group.getDataSet(node_name);
    ...
    std::vector<int64_t> nodes(node_dataset.getElementCount());
    node_dataset.read(nodes);
}

void save_messages_to_h5(
    std::vector<core::messaging::SpikeMessage> messages, const std::filesystem::path &path_to_save, float time_per_step)
{
    HighFive::File data_file(path_to_save.string(), HighFive::File::Create | HighFive::File::Overwrite);
    ...
    spike_group.createDataSet("node_ids", nodes);
    auto time_set = spike_group.createDataSet("timestamps", timestamps);
    time_set.createAttribute("units", std::string{"step"});
}

But on CI I have got warning: "Warning! ***HDF5 library version mismatched error***".

Could you tell me where the problem might be?

P.P.S.: Sorry for offtopic.

1uc commented 3 months ago

One frequent source of problem when testing are non-unique filenames and parallel tests. However, they usually manifest in a different error, something about exclusive access.

The warning Warning! ***HDF5 library version mismatched error*** seems to suggest that something is compiling against one version of HDF5; and then linking against another version. IIUC, HDF5 has a (mostly) stable API, but not ABI; meaning you can recompile against different version of HDF5 without changes to your source code; but you can't link against a different version of HDF5 without recompiling your sources.

These are easy causes for mismatching versions of HDF5:

To find out which version of HDF5 is being using when the application is run you can use ldd:

$ ldd build-opencv/tests/unit/test_all_types
    linux-vdso.so.1 (0x00007ffc1b3e3000)
    libhdf5.so.311 => /usr/lib/libhdf5.so.311 (0x000071c0c2000000)

(which tells me, mine is using the system installed HDF5.)

If the problem persists after you made the HDF5 versions match: Tools like gdb, valgrind and sanitizers (AddressSanitizer) can help finding where memory related issues happen. For gdb you can use ctest --test-dir ... -VV which will print lots of stuff, including the precise command and environment variables for running each test. (This information can then easily be used to run gdb.)

artiomn commented 3 months ago

One frequent source of problem when testing are non-unique filenames and parallel tests. However, they usually manifest in a different error, something about exclusive access.

No, Gtest runs tests sequentially (I don't use tests parallelization, yet).

Install HDF5 in a custom location. Provide that location during building. Then when running an LD_LIBRARY_PATH might be missing, and a system installed version of HDF5 is picked up instead of the copy installed in the custom location.

I used to use the HDF5 system (installed in the image), but now I use libhdf5 downloaded via CPM.cmake (because I need the Windows build too). And I think you're right: finding and fixing the version mismatch in the first place is a good solution.

If the problem persists after you made the HDF5 versions match: Tools like gdb

Yes, building CI image locally and a lot of debugging...

Ok, thank you, I'll try to fix it.