I think the problem started in #78, when the default r_outer when from 5 to 10. But I can't tell why from the PR.
@SandyYuan, @boryanah: do you know if we need 10 Mpc/h as the outer radius?
For context, we're trying to compute the sum of masses of neighbor halos in two radius apertures and then taking the difference. The inner aperture seems okay, just the outer is using a ton of memory to save all the indices. It finds over 2 billion matches (which should only be ~16 GB, but maybe cause these are Python lists, we're hitting over 100 GB. Still not 100% clear on why...).
Really, the ideal algorithm wouldn't even store the indices. We would just add up the mass on the fly as we encounter a valid pair in the tree query. But scipy.KDTree doesn't seem to support that. Maybe another library does? Or maybe we could use a different Menv metric?
@epaillas, feel free to jump in here too, since this is probably related to #143 (although it isn't explicitly related to whether one is using halo light cones).
prepare_sim
is using a lot of memory right now. In particular,do_Menv_from_tree()
uses 100+ GB per process with default settings on an AbacusSummit sim. I think the reason is ther_outer
tree query: https://github.com/abacusorg/abacusutils/blob/229fe9a3e5787355bc5109a6346ec17a9a796414/abacusnbody/hod/prepare_sim.py#L304I think the problem started in #78, when the default
r_outer
when from 5 to 10. But I can't tell why from the PR.@SandyYuan, @boryanah: do you know if we need 10 Mpc/h as the outer radius?
For context, we're trying to compute the sum of masses of neighbor halos in two radius apertures and then taking the difference. The inner aperture seems okay, just the outer is using a ton of memory to save all the indices. It finds over 2 billion matches (which should only be ~16 GB, but maybe cause these are Python lists, we're hitting over 100 GB. Still not 100% clear on why...).
Really, the ideal algorithm wouldn't even store the indices. We would just add up the mass on the fly as we encounter a valid pair in the tree query. But
scipy.KDTree
doesn't seem to support that. Maybe another library does? Or maybe we could use a different Menv metric?Probably related to #143.