UoB-HPC / stdpar-nbody

https://research-information.bris.ac.uk/en/publications/efficient-tree-based-parallel-algorithms-for-n-body-simulations-u
MIT License
2 stars 0 forks source link

Substantial Hilbert tree improvement: Improve distance threshold calculation & memory access pattern #39

Closed illuhad closed 3 months ago

illuhad commented 3 months ago

With this, time per step for 7 million particles drops from 2.8s to 1.8s on my GPU with theta=0.5. (outdated, see EDIT)

Not sure if we need to include this in the paper, as most of our conclusions likely won't change. Maybe useful for future work, or for the camera ready version.

EDIT: I've added an additional improvement. We are now no longer storing monopole mass and position separately, but instead using a single 4-component vector to store both (in 3D case). This not only simplifies memory access pattern, it also causes our vec objects to become aligned such that compilers can emit vector loads. With this, time per step drops further to 1.3s..