celeritas-project / celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
https://celeritas-project.github.io/celeritas/user/index.html
Other
58 stars 32 forks source link

Improve HGCal build performance by a factor of 10 using surface hashing #1183

Closed sethrj closed 1 month ago

sethrj commented 3 months ago

On #1180, loading and converting the HGCal geometry takes ~12 seconds, most of which is spent in surface deduplication. As anticipated, it's not performant to search through all existing surfaces when inserting a new one, as that's a quadratic operation.

This PR generates a spatial "bin" for each surface type, such that all surfaces that can compare soft equal are in it (or, if the surface is at the edge of the bin, in an adjacent bin). On inserting a new surface, it adds its ID into an unordered multimap that can be searched for subsequent surface additions.

The spatial bins right now are hardcoded to 0.01 * the length scale (which is 1mm for converted Geant4 geometry). This doesn't mean that surfaces that close will be deduplicated; rather, it just means that all surfaces within 0.01mm of each other will be checked for soft equivalence.

Performance testing

With current develop, the surface insertion performance test (which inserts planes and spheres) clearly demonstrates the quadratic performance:

Sampling 16...8.75e-06 s
Sampling 32...2.9792e-05 s
Sampling 64...0.000111834 s
Sampling 128...0.000434041 s
Sampling 256...0.00161142 s
Sampling 512...0.00625429 s
Sampling 1024...0.0245268 s
Sampling 2048...0.0968056 s
Sampling 4096...0.370775 s
Sampling 8192...1.42327 s
Sampling 16384...5.27299 s
Sampling 32768...18.7109 s

On this branch, performance is much closer to linear. The increase at higher surface counts is likely because the acceleration grid tolerance needs tweaking.

Sampling 16...1.0458e-05 s
Sampling 32...1.7834e-05 s
Sampling 64...3.325e-05 s
Sampling 128...7.0083e-05 s
Sampling 256...0.000154959 s
Sampling 512...0.00034425 s
Sampling 1024...0.00083325 s
Sampling 2048...0.00217404 s
Sampling 4096...0.00705275 s
Sampling 8192...0.0232007 s
Sampling 16384...0.0844078 s
Sampling 32768...0.324391 s
sethrj commented 1 month ago

@elliottbiondo I'm confident now that this works, so you can review this as well. (When we do performance testing we can revert this PR pretty easily since it doesn't touch much code.)

elliottbiondo commented 1 month ago

(will do today)