SWIFTSIM / HBTplus

HBTplus halo finder adapted for the FLAMINGO and COLIBRE simulations
0 stars 0 forks source link

Remove duplicate particles in mergers #32

Closed VictorForouhar closed 1 month ago

VictorForouhar commented 2 months ago

Do merging checks after unbinding an object, rather than the whole set of objects. This ensures we can update the boundness of particles before unbound ones are passed to the parent of the subhalo. In some cases, this leads to particle duplication.

VictorForouhar commented 2 months ago

Running first test with COLIBRE example box.

VictorForouhar commented 2 months ago

Early testing suggests new merging method does indeed decrease number of duplicate particles. We achieve the same number as tests where merging between subhaloes are disabled.

As a comparison, the master version produces:

Testing presence of duplicate particles in HBTplus at snapshot index 10.
Opening HBT catalogue: /cosma7/data/dp004/dc-foro1/colibre/hbt_testing/merged_master/010/SubSnap_010.{file_nr}.hdf5 --- DONE
1170 particles out of 8695713 are duplicate.
60 unique subhalos out of 57064 share particles.

Master version without merging:

Testing presence of duplicate particles in HBTplus at snapshot index 10. 
Opening HBT catalogue: /cosma7/data/dp004/dc-foro1/colibre/hbt_testing/removing_duplicates_no_mergers/010/SubSnap_010.{file_nr}.hdf5 --- DONE
70 particles out of 8673697 are duplicate.
13 unique subhalos out of 57064 share particles.

New merging version:

Testing presence of duplicate particles in HBTplus at snapshot index 10. 
Opening HBT catalogue: /cosma7/data/dp004/dc-foro1/colibre/hbt_testing/new_merging_criteria_fix/010/SubSnap_010.{file_nr}.hdf5 --- DONE
70 particles out of 8676564 are duplicate.
13 unique subhalos out of 57064 share particles.
VictorForouhar commented 2 months ago

There are some small differences in the timing and number of mergers. I am assuming this is due to the fact that the master version only checks merging with the phase-space information of the first round of unbinding.

The new version uses up-to-date phase-space information, e.g. if a subhalo can become overlapping with its parent after accreting particles from its satellite. The master version does not take into account the newly accreted particles. Will check a few examples next week.

VictorForouhar commented 2 months ago

I also have an idea concerning the remaining set of duplicate particles, I will test it next week.

VictorForouhar commented 2 months ago

This new version results in slightly more mergers relative to the original version. Below is the number of mergers at a given snapshot. Overall, 25 more objects are identified as merged by snapshot 12 in the new version, an increase of 0.6%. image

VictorForouhar commented 2 months ago

Running a test with the asserts added in 3c7967e results in no crashes, indicating the core calculations and phase-space distances are the same the old and new versions. Prior to ad15256, they asserts would crash. I will now remove those temporary asserts.

VictorForouhar commented 2 months ago

I will re-run a test to see how the merger rates vary between old and new versions. In particular, see how https://github.com/SWIFTSIM/HBTplus/pull/32#issuecomment-2082136274 changes now that the spatial distance calculations are done correctly.

VictorForouhar commented 2 months ago

After the PCB correction, they are now in better agreement (note difference in y-axis scale of bottom plot). The number of merged objects is the same in both runs. image

VictorForouhar commented 2 months ago

Looking into differences in merging of matched objects. Some appear to be disrupted in the old but merged in the new (first plot). Others have experienced mergers in the old but not in the new (second plot). I now believe my previous statement concerning the updating the phase-space properties after unbinding a single object vs all can also suppress mergers. It all depends on where the newly accreted particles shift the core to. image image

VictorForouhar commented 2 months ago

First example shown above is caused due to a small difference in Nbound of the parent (+1 particle is bound), and the PhaseSpaceDistance goes from > 2 to < 2.

VictorForouhar commented 2 months ago

Unit test for phase-space calculations passes.

VictorForouhar commented 2 months ago

Doing one last test based on COLIBRE, but I am happy for the code to be reviewed.

jchelly commented 2 months ago

The original version of the code sometimes outputs subhalos which satsify the merging criteria but have not been merged. I guess that should be eliminated in this branch? If a merger produces an object which itself satisfies the condition to merge then that object will be merged.

In any case the plot I used to find those cases (a scatterplot of the normalized separation in position vs velocity) might be a good sanity check for this branch. I'll see if I can find the script and run it on this and master.

VictorForouhar commented 2 months ago

I believe so, yes. The previous version could only handle overlaps with the information of the first round of unbinding. Now, if a merger causes an object to overlap with another due to the accreted particles of the merged child, it will be handled automatically.

jchelly commented 2 months ago

Running this on the colibre test with gcc address sanitizer it stops with a heap buffer overflow error at subhalo_tracking.cpp:679. There's a full stack trace (although with lots of repetition due to MPI) in /cosma7/data/dp004/jch/HBT-colibre/merging/remove_duplicate_particles_in_mergers/logs/job.HBT.6937740.out. I'll try it in DDT.

VictorForouhar commented 2 months ago

Running with address sanitizer now works after 200d30f. Will submit a FLAMINGO test now, which also crashed last night at this point.

VictorForouhar commented 2 months ago

The memory issues we currently have appear to be within the merger tree routines. Whilst that is getting solved, I will work on getting the last mechanism for particles duplication under control.

VictorForouhar commented 1 month ago

Made the suggested changes. @robjmcgibbon

jchelly commented 1 month ago

If I run the colibre test with asserts enabled I get a crash with this branch:

HBT: /cosma7/data/dp004/jch/HBT/crash/HBTplus/src/subhalo_tracking.cpp:1173: void SubhaloMasker_t::MaskTopBottom(HBTInt, std::vector<Subhalo_t>&, const MappedIndexTable_t<long int, long int>&): Assertion `insert_status.second == true' failed.
VictorForouhar commented 1 month ago

Okay, thanks for flagging. Weird that it was not triggered before... could it be some of the latest commits?

jchelly commented 1 month ago

It was processing snapshot 4:

saving 4086 subhalos to ./test_output//003
Conversion factor from SWIFT length units to 0.681 Mpc/h = 1
Conversion factor from SWIFT mass units to 6.81e+09 Msun/h = 1
Conversion factor from SWIFT velocity units to 1 km/s = 1
Null group ID is 2147483647
Number of ranks per node reading simultaneously is 28
33223360 particles loaded at Snapshot 4(36)
10103 groups loaded at snapshot 4(36)
HBT: /cosma7/data/dp004/jch/HBT/crash/HBTplus/src/subhalo_tracking.cpp:1173: void SubhaloMasker_t::MaskTopBottom(HBTInt, std::vector<Subhalo_t>&, const MappedIndexTable_t<long int, long int>&): Assertion `insert_status.second == true' failed.
VictorForouhar commented 1 month ago

I ran it with the last commit of the branch doing the masking, de59340d623044424df2ee9e2c258926aaf22c4c, and it did not crash. Merging these two branches must have caused the issue.

VictorForouhar commented 1 month ago

Which is weird, since that branch had already incorporated changes in the merging algorithm.

VictorForouhar commented 1 month ago

Branch immediately after merging worked. The issue is likely caused by the subsequent commits.

VictorForouhar commented 1 month ago

Latest commit fixes the assert failure. Essentially, I was calling the MergeTo routine twice if it ran in DEBUG mode.