Closed JasperSolt closed 2 months ago
I ran the halo finder on an initial conditions box with BOX_LEN=32 to try to troubleshoot the issue. I found that the generated halo field had the parameters 'buffer_size': 68467197
and 'n_halos': 27140773
. This means the buffer is overshooting the expected number of halos by around 200%. Is it possible to add a parameter to manually set the buffer size, or just significantly reduce the buffer itself? The unused reserved memory makes generating larger cubes impossible.
Over the weekend, I did a bit more testing. I attempted to complete halo finding for differently sized boxes, just to see when the memory overhang became too great. For each job I requested 250 Gb of memory on a compute node on Brown University's CSGrid cluster. I also tried to estimate how much memory the halo list theoretically should be using, just with the back-of-the-envelope calculation of memory = buffer size * 4 bytes for int32 * 3 coordinates
. Additionally I messed around a bit with how the buffer_size is estimated in the python wrapper, so don't take too much stock in the absolute values listed here.
Here's what came of it:
Test 1
box_len = 128
buffer_size = 2,190,950,249
estimated memory = ~26 Gb
n_halos = 1,738,884,214
complete
Test 2
box_len = 192
buffer_size = 9,859,276,307
estimated memory = ~118 Gb
n_halos = 5,868,611,828
complete
Test 3
box_len = either 256 or 224 (forgot to write it down oop)
buffer = 15,656,164,876
estimated memory = ~188 Gb
killed
It seems at least believable to me that Test 3 would fail, given that the estimated memory cost of the halo list is only 60 Gb away from the requested 250 Gb. But maybe my perception is skewed cause I'm used to dealing with such large memory objects, and gigabyte values have lost all meaning to me. Wouldn't be the first time.
My question is, does the halo list need to be this long? Is there a way I could make the halo sampler "coarser-grained" (for lack of a better term) for larger simulations? Other semi-numerical sims I've worked with in the past that use halo finders don't have nearly this much memory overhead, and there's only so much memory I can request.
I have discovered the sampler_min_mass parameter in global_params. Apologies for the foolishness
Hello, I'm running the halo sampler in the
v4-prep
branch to generate some lightcones, and I'm encountering the following issue:The lightcones I'm attempting to run are 1 Gpc in size, with a box length of 256 and a redshift range of 6-16. I know that's large, but even so 400 Gb for a list of halos seems excessive. Is there a way to reduce the size of the halo list?