21cmfast / 21cmFAST

Official repository for 21cmFAST: a code for generating fast simulations of the cosmological 21cm signal
MIT License
56 stars 37 forks source link

Issue with halo sampler scaling with box length #384

Closed JasperSolt closed 1 month ago

JasperSolt commented 2 months ago

Hello,

I closed my previous issue as it had to do with memory, but this seems like a separate concern.

When generating lightcones, increasing the box length seems to reduce(?) the number of halos found. I discovered this while doing a parameter sweep of the global parameter 'SAMPLER_MIN_MASS' and found that settings which returned reasonable reionization histories at smaller box lengths would begin to break down at larger box lengths. Seemingly, no halos can be found once the box length exceeds a certain value, all other parameters being equal.

Attached is a doc showing lightcones generated at different values of 'SAMPLER_MIN_MASS' and box lengths. You can see that the larger boxes have a 0 halo field and no ionization history. box length halo sampler test.pdf

If you could look this over that would be appreciated.

daviesje commented 2 months ago

Hi Jasper,

Thanks for raising this issue, can you share the scripts and parameters you use to generate the halo populations? The sampler has been tested up to box sizes of 1Gpc under a few minimum masses, but it's possible there is some other parameter incompatibility we have missed.

JasperSolt commented 2 months ago

Yes, here is my python script:

import os
import glob
import argparse
import numpy as np
from datetime import datetime

import py21cmfast as p21c
from py21cmfast.inputs import AstroParams
from py21cmfast import global_params
from scipy.stats import truncnorm

def get_truncated_normal(mean=0, sd=1, low=0, upp=10):
    return truncnorm(
        (low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)

'''
#if USE_MASS_DEPENDENT_ZETA=True
bound_dict = {
                'F_STAR10': (0.0, 0.2),
                'ALPHA_STAR':(-0.5, 1.0),
                'F_ESC10': (0.0, 0.5),
                'ALPHA_ESC': (-1.0, 0.5),
                'M_TURN': (8.0, 10.0),
                'L_X': (38, 42),
                't_STAR': (0.1, 1.0),
                'NU_X_THRESH': (100, 1500)
             }
'''
bound_dict = {}

def random_astro_params():
    astro_params = AstroParams._defaults_
    for key, val in bound_dict.items():
        mu, sigma = AstroParams._defaults_[key], val[1]-val[0]
        X = get_truncated_normal(mean=mu, sd=sigma, low=val[0], upp=val[1])
        astro_params.update({key: X.rvs()}) 
    return astro_params

#set up parser
parser = argparse.ArgumentParser()
parser.add_argument('-l', '--box_length', type=int)
parser.add_argument('-s', '--seed', type=int)
parser.add_argument('-smm', '--sampler_min_mass', type=float)
args = parser.parse_args()

#Simulation meta parameters
n_samples = 1
n_voxels = args.box_length #512
random_seed = args.seed
smm=args.sampler_min_mass * 1e6
vox_length = 3.9
sim_volume = vox_length*n_voxels 
sim_name = "smm_test"
user_params = {"HII_DIM": n_voxels, "BOX_LEN": sim_volume, "USE_INTERPOLATION_TABLES":True}
lightcone_quantities = ('brightness_temp', 'density', 'xH_box', 'halo_mass')

#Format Cache
local_dir = f'/users/jsolt/scratch'
cache_path_format = f'{local_dir}/21cmFAST_cache'
cache_path = f'{cache_path_format}'
save_path = f'/users/jsolt/data/jsolt/halo_lightcones_bigmem_{sim_name}'

if not os.path.exists(save_path):
    os.mkdir(save_path)

if not os.path.exists(cache_path):
    os.mkdir(cache_path)

p21c.config['direc'] = cache_path

#run sims
start_time = datetime.now()

flag_options = p21c.inputs.FlagOptions(
                                        USE_HALO_FIELD=True, #this makes it DexM
                                        HALO_STOCHASTICITY=True, #this uses the new beta branch halo sampler
                                        USE_MASS_DEPENDENT_ZETA=True
                                    )
astro_params = random_astro_params()

with global_params.use(sampler_min_mass=smm):

    #initial conditions
    print("Initial Conditions")
    init_box = p21c.initial_conditions(
                                    user_params=user_params,
                                    random_seed=random_seed
                                    )
    '''
    #testing halo sampler
    print("Halo Sampler")
    halolist_init = p21c.determine_halo_list(redshift=6.0,
                                         init_boxes=init_box,
                                         astro_params=astro_params,
                                         flag_options=flag_options,
                                         random_seed=random_seed)
    print(halolist_init.buffer_size)
    print(halolist_init.n_halos)
    '''

    lightcone = p21c.run_lightcone(
                                    redshift = 6.0,
                                    max_redshift = 16.0,
                                    astro_params = astro_params,
                                    flag_options = flag_options,
                                    lightcone_quantities = lightcone_quantities,
                                    global_quantities = lightcone_quantities,
                                    init_box=init_box,
                                )
    lightcone.save(direc=save_path)

end_time = datetime.now()
print(f"*** TOTAL TIME: {end_time - start_time}***")

edit: apologies, this whole time I've been saying box length when I meant HII dim

JasperSolt commented 2 months ago

I was able to gain access to the large memory nodes on Brown's compute cluster to test larger box sizes with smaller 'SAMPLER_MIN_MASS' values. At SAMPLER_MIN_MASS=1e8 and HII_DIM=256, still no halos are being found (see attachment) Sampler min mass 1e8.pdf

daviesje commented 2 months ago

I am testing this but be aware that with a cell size of 4 Mpc and a HII_DIM of 256, so your box is above 1Gpc on a side and your catalogues will be huge with such a low minimum mass. If it's a memory issue I would naively expect this to crash immediately when it tries to allocate a box but this may have different behaviour depending on your system. Can you try the following steps:

JasperSolt commented 2 months ago

I've done the last two things. It does crash when trying to allocate too much memory, and I have a decent understanding of how to manage that by now.

The issue is, values of minimum mass that produce reasonable halo fields on smaller boxes fail to produce any halos on larger boxes. I'm not sure how this could be a memory issue, given that it DOES fail when I attempt to run lightcones with too high a box size / too low a minimum halo mass. See the pdf I attached to the original issue for a visual summary of the problem.

daviesje commented 2 months ago

The linked pull request fixes some indexing issues with very large halo catalogues, I have run a test with your script above at HII_DIM of 64, 128, 192 and 256 to verify that the halo boxes are not zero everywhere. Due to the memory requirements I was not able to do more extensive testing, so I will leave this open for a little while so you can verify that this fixes your specific issue.

JasperSolt commented 2 months ago

Will do. Testing might take me a bit but I'll attempt to record what I find and post here.

JasperSolt commented 2 months ago

Ok, I've since been able to test running a box with an HII_DIM of 128 and 192 for a lightcone with SAMPLER_MIN_MASS = 1.5e8 on the bigmem nodes with 1.5Tb of available memory. HII_DIM=128 runs fine while HII_DIM=192 segfaults. This is not an issue of not enough memory, as my job report states it only used around 130Gb of the 1.5Tb available.

daviesje commented 2 months ago

It's definitely not running out of memory, ~300GB was what I used with that box size. Since it's running on my end I'll need a bit more info. Can you run with the debug logs on (compile with either DEBUG=TRUE or LOG_LEVEL=DEBUG) and see where it crashes?

One other possibility is the caching, if you generated a HaloField with the previous version, it might still be loading in the corrupted boxes, if you have the same parameter set. You can either clear the cache or pass regenerate=True into run_lightcone to avoid reading from the cache.

JasperSolt commented 2 months ago

Sorry--how do I compile with DEBUG=TRUE? I'm using the python wrapper and am not very familiar with C.

daviesje commented 1 month ago

No worries. You set the environment variable in your terminal when calling your install script e.g DEBUG=TRUE pip install ... or DEBUG=TRUE conda install .... This turns on the logging statements from the C backend, as well as turning off some math optimisations. If you just want the logs use LOG_LEVEL=DEBUG instead of DEBUG=TRUE

Additionally, there are logs in the python wrapper which you can access by adding the following lines at the start of the script you use to run 21cmfast.

import logging
logging.basicConfig(level=logging.DEBUG)
JasperSolt commented 1 month ago

Awesome, thank you. Will do this

JasperSolt commented 1 month ago

Ok, weirdly, reverting to commit e7bf5ba and turning on debug mode seems to have fixed the issue. I'm running sims with an HII_DIM=256 and SAMPLER_MIN_MASS = 1.5e8 without bugs. I'll open a new issue if this comes up again but this seems to be solved?