Distribute particles between MPI ranks using hash of particle ID

SWIFTSIM / HBTplus

HBTplus halo finder adapted for the FLAMINGO and COLIBRE simulations

0 stars 0 forks source link

Distribute particles between MPI ranks using hash of particle ID #27

Closed jchelly closed 1 month ago

jchelly commented 2 months ago

Currently HBT arranges snapshot particles for easy lookup by ID by putting different ranges of IDs on different MPI ranks. The algorithm used to determine the assignment of ID ranges to ranks seems to become very slow on hydro runs where the IDs are not a contiguous sequence.

This pull request modifies the code to distribute particles according to the hash of their ID. Assigning particles to ranks is then just done by rank = hash(id) % comm_size, which is simple and fast but we rely on the hash values being evenly distributed to ensure reasonable load balancing.

jchelly commented 2 months ago

This is not ready to merge. I get a failed assert() on the small Colibre test,

jchelly commented 2 months ago

The problem was that in SubhaloSnapshot_t::UpdateMostBoundPosition we're looking for particles by ID but we don't know their type so we don't know if they should be found. I've set Type=TypeMax in this case to avoid triggering the check. This didn't show up previously because I was using the orphan_tracers branch so the tracers could never disappear.

jchelly commented 2 months ago

Now that the orphan_tracers branch is merged we always expect tracers to exist, so when the particle exchanger is called from UpdateMostBoundPosition we can assert that all particles should be found.

jchelly commented 2 months ago

On the small Colibre test this now gives almost identical output to master if I run on one thread and disable random sampling. If I look at the final output the number of subhalos is identical and all of their properties are identical except for the Rank (many differences), Depth (one subhalo differs) and SnapshotIndexOfLastIsolation (one subhalo differs).

I think maybe I've inadvertently changed the ordering of subhalos that have the same mass within their host halo.

jchelly commented 2 months ago

If I modify CompareMass_t to use the TrackId as a tie breaker when the masses are equal, then this branch and master produce output that is identical at snapshot 15 except for rounding error level differences in SpecificSelfPotentialEnergy for five halos.