Memory requirement for prepare_sim

abacusorg / abacusutils

Python code to interface with halo catalogs and other Abacus N-body data products

https://abacusutils.readthedocs.io

GNU General Public License v3.0

14 stars 13 forks source link

Memory requirement for prepare_sim #143

Open epaillas opened 1 month ago

epaillas commented 1 month ago

Hi all,

I was wondering if any of you have successfully generated lightcones subsamples with prepare_sim on Perlmutter for z >= 0.8. The lower redshifts work fine, but for z = 0.8 the memory requirement hits the ~500 GB limit of a Perlmutter CPU node and the code chokes on this step

compiling compaso halo catalogs into subsampled catalogs
processing slab  0
loading halo catalog 
total number of halos,  40906121 keeping  9132292
masked randoms =  12.467233937923373
Building and querying trees for mass env calculation

My configuration file uses

prepare_sim:
    Nparallel_load: 1

(but I don't think this helps since we are procesing a single slab anyway).

Is there any workaround for this? I guess one could try decreasing the number of randoms for the environment calculation but this is already low relative to the number of haloes, so I don't know how safe that would be...

Cheers, Enrique

lgarrison commented 1 month ago

I'm not sure, maybe @boryanah has? 500 GB is a lot, maybe we're using more memory than we need to somewhere. It might be worth adding up the expected number of allocations and comparing that to the observed amount (500 GB) to try to understand what's happening.

boryanah commented 1 month ago

Hi both,

That might have to deal with the recent PR in which we increased the number of randoms for the environment calculation, though as Enrique points out a lower number might not be sufficient. I have certainly run subsampling on higher redshift slices in the past, but perhaps we need to do some profiling to understand what is eating up so much memory now. I am currently on vacation and won't be able to look into this until early August... There are some places where the code could be made more efficient; has @SandyYuan also looked into that recently?

Thanks, Boryana

epaillas commented 1 month ago

Thanks for the quick replies! I don't think the cosmodesi version of abacusutils I've been using has pulled the latest commits that Pierre implemented (he's on the office next door so we discussed this a bit :) ), so it looks like the code is stalling even with rand = 10. I'll have a closer look to see if there's anything I can tweak to make it fit in the node.

There's no rush, so please enjoy your vacations!

Cheers,

boryanah commented 1 month ago

@lgarrison @SandyYuan One possible thing we/I can implement is reducing the randoms by a factor of 8 when the observer is in the corner (currently it generates them on the full sky regardless). There are a couple of local variables that I think create hard copies of the data as well. There might be other things to do (e.g., for the huge boxes, we should be able to express the random density analytically), but perhaps that is enough? I also looked at the rest of prepare_sim and I think there are other places where we can shave off some memory, but that might not be necessary. Let me know what you think.

lgarrison commented 1 month ago

Only generating randoms for the given light cone geometry makes sense to me!