BlueBrain / HighFive

HighFive - Header-only C++ HDF5 interface
https://bluebrain.github.io/HighFive/
Boost Software License 1.0
673 stars 159 forks source link

Optimize chained hyperslab selection. #1031

Closed 1uc closed 1 month ago

1uc commented 1 month ago

A common pattern for creating semi-unstructured selection is to use many (small) RegularHyperSlab and chain them:

HyperSlab hyperslab;
for(auto slab : regular_hyper_slabs) {
  hyperslab |= slab;
}

This eventually triggers calling:

for(auto slab : regular_hyper_slabs) {
  auto [offset, stride, counts, blocks] = slab;
  H5Sselect_hyperslab(space_id, offset, stride, counts, block);
}

Measurements show that this has runtime that's quadratic in the number of regular hyper slabs. This starts becoming prohibitive at 10k - 40k slabs.

We noticed that H5Scombine_select does not suffer from the same performance issue. This allows us to optimize (long) chain of Op::Or using divide and conquer.

The current implementation only optimizes streaks of Op::Or.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 86.88%. Comparing base (8145c27) to head (94727b8).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1031 +/- ## ========================================== + Coverage 86.78% 86.88% +0.09% ========================================== Files 101 101 Lines 5964 6008 +44 ========================================== + Hits 5176 5220 +44 Misses 788 788 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

1uc commented 1 month ago

Yes, they provided the following reproducer:

import libsonata
import numpy as np
import time

np.random.seed(42)

sto = libsonata.NodeStorage('sscx-nodes-sonata.h5')
pop = sto.open_population('All')

#ids = np.arange(0, 100000, 2)
count = int(0.01*pop.size)
# count = 100
print(f'selecting {count} from {pop.size}')
ids = np.random.randint(0, pop.size, count)

t1 = time.perf_counter()
sel = libsonata.Selection(ids)
t2 = time.perf_counter()
print(f"elapsed = {t2 - t1}")

print(np.mean(pop.get_attribute('x', sel)))

The selection results in about 41k slabs, which takes 40s with the performance bug and 0.06 - 0.1s with libsonata@master and highfive@1uc/backport-optimize-hyperslab-selection. It takes 0.1 - 0.16s for libsonata@0.1.24.

We also ran their integration tests against the backport of this branch: https://github.com/BlueBrain/HighFive-testing/actions/runs/10108668331