Closed kdheepak closed 1 year ago
Just iterating over the groups and datasets seems fast but it looks like the actual part that's slow is getting the ds.name()
. Any suggestions for the fastest way to get the fully qualified dataset names for all the datasets in a file?
Could you try the iter_visit
method instead of getting the name after iterating? We don't store the name of the variable and fetch using the id which might be slow
Thanks for your comment @mulimoen and all your work on this crate!
After looking at alternative methods, I ended up doing this instead:
let f = hdf5::File::open(&file).unwrap();
for group in f.member_names().unwrap() {
for dataset in f.group(&group).unwrap().member_names().unwrap() {
names.push(format!("{}/{}", group, dataset));
}
}
which worked out to be quite a bit faster. It now takes less than a 5 seconds to get all the names, open each dataset using the name, and read some metadata from each dataset. I didn't time the exact difference between this and Julia but this is now usable for me and I can close this issue.
I have a HDF5 file that has almost 4492 datasets.
If I iterate over all the groups and then all the datasets in Julia, it takes around half a second:
However in Rust it takes almost a full minute. Here's the code that I'm using:
And here's how I'm testing it after adding the above code to a
./src/names.rs
:Do you know what I'm doing incorrectly? It is the exact same HDF5 file that I'm using in Rust and Julia.