aldanor / hdf5-rust

HDF5 for Rust
https://docs.rs/hdf5
Apache License 2.0
308 stars 82 forks source link

Freeze trying to read compound datasets with variable length strings #251

Open geolehmann opened 1 year ago

geolehmann commented 1 year ago

Hi everybody,

I have problems trying to read compound datasets which also consist of strings with variable lengths. While I have no problems reading other types like floats or integers from a compound, my applications freezes completely when I try to read the strings. Interestingly, I can read normal string datasets, the problem only occurs for compound datasets with strings (using VarLenAscii/VarLenUnicode/VarLenArray).

I am using the h5-types crate with the "h5_alloc" feature enabled under Windows 10 with version 1.14.0 of the HDF library.

This is the relevant code I use for loading the dataset:

#[derive(H5Type, Debug, Clone, PartialEq)]
#[repr(C)]
pub struct Index {
    pub start_index: u32,
    pub size: u32,
    pub object_ID: hdf5::types::VarLenUnicode,
    pub data_ID: hdf5::types::VarLenUnicode,
}

let index_dataset = file.dataset(&path).unwrap();
let index_data = index_dataset.read_1d::<h5well::Index>();

In the screenshot below is the structure of the dataset from HDFView, which I try to load: image

I tried to hunt down the problem and it seems to be somewhere in the "read_into_buf" function, but I am stuck now. Did anybody encounter a similar issue or can point me in the right direction? Thanks in advance for any help!

mulimoen commented 1 year ago

read_into_buf is suggesting that the hdf5 library is doing some work or locking up. Do you have a debugger available to obtain a stacktrace? Does reading one element (Index) finish?

If the dataset is openly available I could check if this can be reproduced on linux and debug it further.

geolehmann commented 1 year ago

Yes, I tried reading only one element, but that did not work either. Here is a minimum example file, containing only the compound dataset: https://drive.google.com/file/d/1CJeFNq84Z_lfThG1r75NsQKv2kv4Are8/view?usp=sharing

About the stacktrace - since the program does not crash, I probably would need to manually obtain a stacktrace at some point? I set one at the end of the read_raw function, since I now observed that the freeze actually happens there returning the result of the read_into_buf function - the relevant part of the trace looks like this:

}, {
     fn: "hdf5::hl::container::Reader::read_raw",
     file: "D:\dev\hdf5-rust-master\hdf5\src\hl\container.rs",
     line: 164
 }, {
     fn: "hdf5::hl::container::Reader::read",
     file: "D:\dev\hdf5-rust-master\hdf5\src\hl\container.rs",
     line: 140
 }, {
     fn: "hdf5::hl::container::Reader::read_1d",
     file: "D:\dev\hdf5-rust-master\hdf5\src\hl\container.rs",
     line: 173
 }, {
     fn: "hdf5::hl::container::Container::read_1d",
     file: "D:\dev\hdf5-rust-master\hdf5\src\hl\container.rs",
     line: 600
 }, {
     fn: "k4::loader_geoh5::load",
     file: ".\src\loader_geoh5.rs",
     line: 157
 }, {
mulimoen commented 1 year ago

It seems the strings as returned as nullpointers which causes issues (and should be fixed!). I think this specific issue can be fixed by providing the proper names to the members to match what is in the file with a rename, i.e.

#[derive(H5Type, Debug, Clone, PartialEq)]
#[repr(C)]
pub struct Index {
    #[hdf5(rename = "Start index")]
    pub start_index: u32,
    #[hdf5(rename = "Size")]
    pub size: u32,
    #[hdf5(rename = "Object ID")]
    pub object_ID: hdf5::types::VarLenUnicode,
    #[hdf5(rename = "Data ID")]
    pub data_ID: hdf5::types::VarLenUnicode,
}
geolehmann commented 1 year ago

It works - a thousand thanks for your fast help!! I was not aware of the rename helper attribute, I should have read the changelog....

mulimoen commented 1 year ago

We should do something about the freeze and the segfault, reopening as a reminder