Closed XTRY1337 closed 5 months ago
Which version of PureHDF are you using? The newer ones should not allow Read<double>().ToList()
and instead Read<double[]>().ToList()
needs to be used to make it work like before.
PureHDF is not yet optimized for high performance, especially when in comes to many small groups / datasets since there is no metadata cache yet. Reading big datasets (with reasonable chunk size) should be quite fast, though.
There is still some space for improvements in your code. We can dramatically reduce the number of file structure lookups by using the following code:
List<List<List<List<double>>>> ReadStruct(string groupPath, NativeFile readFile)
{
List<List<List<List<double>>>> Group1 = new();
foreach (var group1 in readFile.Group(groupPath).Children().OfType<IH5Group>())
{
List<List<List<double>>> Group2 = new();
foreach (var group2 in group1.Children().OfType<IH5Group>())
{
List<List<double>> Group3 = new();
foreach (var dataset in group2.Children().OfType<IH5Dataset>())
{
Group3.Add(dataset.Read<double[]>().ToList());
}
Group2.Add(Group3);
}
Group1.Add(Group2);
}
return Group1;
}
Maybe this speeds up things a little bit. If that does not help, an example file would be useful to see where the bottleneck is.
I'm using 1.0.0-alpha.25, but using .OfType<> and improve a lot the performance, thank you very much for that. Keep your good work.
I am glad it works better now :-)
Note: When upgrading to one of the newer beta versions there will be a few breaking changes as the one mentioned above (double
vs double[]
).
More info here: https://github.com/Apollo3zehn/PureHDF/releases/tag/v1.0.0-beta.1
Hello, I'm trying to read a sequential folder/group in my HDF5 Ex: Group1/Group2/Group3/Datasets and they have the size for example: Size Group1 = 8, Size Group2 = 6, Size Group3 = 400, Size Dataset = 2 doubles values. But basically, since I don't have, for example, a function that can read everything directly from a group, I have to do this code to go through all the groups and this usually takes about 20/30 seconds before I can load everything, because I'm read 2 struct of that type that, which makes a total of 76,800 values to be read. Is there any way/function I can use from the library to reduce the time this reading takes? My code:![image](https://github.com/Apollo3zehn/PureHDF/assets/82593913/690e2be2-7f28-44ff-ada4-42d66ba6cb74)