Dataset.Read<T>() performance issue

ReikanYsora commented 1 year ago

Hello,

I'm creating this "issue" to try and find out how to improve read performance for a complete dataset.

At the moment, I sometimes have to load fairly large files (around 800 MB), and it can take up to twenty minutes to read a complete dataset, even when trying to tweek buffer and chunk sizes in PureHDF.

Do you have any other ideas on how I can improve performance? I can't use Multi-Threading or asynchronism (my project uses Unity3D and therefore .Net Standard 2.1).

Thanks in advance !

Apollo3zehn commented 1 year ago

Does you data look like in #29? I.e. your dataset is 8261 x 361 (ok) but your chunk size 8261 x 1 (bad for reading)?

PureHDF does the following when reading a dataset with a chunk size like this:

Row 1:

read and decompress the first chunk (8261 x 1), let's call it "A",
then take the first value: A[0] and keep the chunk in the chunk cache
read and decompress the second chunk (8261 x 1), "B"
then take the first value: B[0] and keep the chunk in the chunk cache

... go on until the first rows has been fully read

Row 2:

Now, repeat everything, except that we now copy A[1], B[1], and so on
...

Row x: ...

If the chunk cache is large enough, the performance might still be OK but if the cache is too small, then all the chunks have to be read and decompressed over and over which is a performance nightmare.

So the root cause most likely lies in the chunk layout (8261 x 1). If the chunk size would be transposed, i.e. 1 x 8261, your data would be read blazingly fast (but there is always a disadvantage: writing the data would become more expensive).

In the example above, without any chunk cache, there would be 8261 x 361 = 2.982.221 single read operations and each operation reads and decodes a full chunk. So this as bad as it could be. More information here: https://github.com/Apollo3zehn/PureHDF/issues/17#issuecomment-1403809255

You can reduce the number of unnecessary chunk reads by using a custom selection (DelegateSelection) as shown here:

using PureHDF;

var root = H5File.OpenRead("my file path");
var dataset = root.Dataset("my dataset");
var rank = dataset.Space.Dimensions.Length;

/* dataset dimensions */
var rows = (uint)dataset.Space.Dimensions[0];
var columns = (uint)dataset.Space.Dimensions[1];

/* chunk dimensions */
var chunkRows = (uint)dataset.Layout.ChunkDimensions[0];
var chunkColumns = (uint)dataset.Layout.ChunkDimensions[1];

// dataset (source)
IEnumerable<Step> SourceWalker(ulong[] limits)
{
    var coordinates = new ulong[rank]; // reuse array to reduce GC pressure

    for (uint i = 0; i < columns; i += chunkColumns)
    {
        coordinates[0] = 0; 
        coordinates[1] = i;

        yield return new Step() 
        { 
            Coordinates = coordinates,
            ElementCount = rows
        };
    }
}

var totalElementCount = dataset.Space.Dimensions.Aggregate(1UL, (x, y) => x * y);
var fileSelection = new DelegateSelection(totalElementCount, SourceWalker);

// memory (target)
IEnumerable<Step> TargetWalker(ulong[] limits)
{
    var coordinates = new ulong[rank]; // reuse array to reduce GC pressure

    for (uint row = 0; row < rows; row++)
    {
        for (uint column = 0; column < columns; column += chunkColumns)
        {
            coordinates[0] = row;
            coordinates[1] = column;

            yield return new Step() 
            {
                Coordinates = coordinates, 
                ElementCount = chunkColumns 
            };
        }
    }
}

var memorySelection = new DelegateSelection(totalElementCount, TargetWalker);

// read
var memoryDims = new ulong[] { rows, columns };

var result = dataset
    .Read<double>(
        fileSelection: fileSelection,
        memorySelection: memorySelection,
        memoryDims: memoryDims
    )
    .ToArray2D(rows, columns);

This should speed up your read operation. There are further optimizations possible. For example, right now, the target walker (which tells PureHDF how to fill the target array) is quite expensive. If the code above is not enough to meet your performance requirements, the target walker may be ommitted but then your target array contains data in the wrong order. This array then needs to be reordered manually, which is probably a little bit faster than using the TargetWalker implementation from above.

ReikanYsora commented 1 year ago

Thanks for your answer !

Unfortunately, this solution takes even longer to read. Indeed, the targetWalker looks extremely heavy. With this solution reading time is about multiplied by 3

Apollo3zehn commented 1 year ago

Is it faster, when you remove the memorySelection parameter from the Read() method? Then your data will be in the wrong order but at least we know where to optimize.

Could you please send a screenshot of the dataset properties opened in HDF View as you did in #29? Or is it exactly the same? I am mainly interested in the chunk dimensions but also other information may be helpful.

Apollo3zehn commented 1 year ago

I have tried to reproduce the problem with the sample file you sent me earlier but it is too small to find the bottleneck. It just loads too fast.

WHTaylor commented 7 months ago

Hi @Apollo3zehn . I've recently come back to a project that uses this library and ran into what I think was this issue when trying to upgrade from a much older version (when the project was still HDF5.NET) to the most recent version.

I eventually managed to track it down to being caused only for .NET 6+ in between the commits ed7b34a and ed25d6a (mostly by accidentally using .NET 5 once in a test, and it suddenly speeding up again). It looks like H5SafeFileHandleReader can be significantly slower than H5StreamReader; the exact slowdown seems to depend heavily on the shape of the data, varying from basically nothing to being ~25x slower for various datasets I was testing against.

I found that using a memory mapped file, as explained here, did the trick, as I believe that then causes PureHDF to use H5StreamReader again? I kind of assume most people use H5File.OpenRead(string) as the 'default' way of using PureHDF, so it might be a bit of a problem that it can cause such a large difference in performance.

Hope that's useful information, and thanks for your work on this, the library's been really useful. Can supply the datasets I used for testing if needed.

Apollo3zehn commented 7 months ago

A test dataset would be great :-)

I was not aware of the performance issues. I am planning to integrate more Benchmarks before the final release.

Thank for your investigations!

WHTaylor commented 7 months ago

datasets.zip

The dataset in each of the files is at the path /raw_data_1/detector_1/counts. They all use chunk layouts with a size of 1 x 8 x \<length of the array>.

File	array dimensions	Approximate `Read` time with H5StreamReader	Approximate `Read` time with H5SafeFileHandleReader
ALF	1 x 2386 x 1361	30-60ms	500-ish ms
INS	1 x 168 x 17250	50ms	50ms (no change)
SXD	1 x 45100 x 1821	0.5 seconds	10-15 seconds

Apollo3zehn / PureHDF

Dataset.Read<T>() performance issue #32