Closed JasperSolt closed 1 month ago
Hi Jasper,
Thanks for raising this issue, can you share the scripts and parameters you use to generate the halo populations? The sampler has been tested up to box sizes of 1Gpc under a few minimum masses, but it's possible there is some other parameter incompatibility we have missed.
Yes, here is my python script:
import os
import glob
import argparse
import numpy as np
from datetime import datetime
import py21cmfast as p21c
from py21cmfast.inputs import AstroParams
from py21cmfast import global_params
from scipy.stats import truncnorm
def get_truncated_normal(mean=0, sd=1, low=0, upp=10):
return truncnorm(
(low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)
'''
#if USE_MASS_DEPENDENT_ZETA=True
bound_dict = {
'F_STAR10': (0.0, 0.2),
'ALPHA_STAR':(-0.5, 1.0),
'F_ESC10': (0.0, 0.5),
'ALPHA_ESC': (-1.0, 0.5),
'M_TURN': (8.0, 10.0),
'L_X': (38, 42),
't_STAR': (0.1, 1.0),
'NU_X_THRESH': (100, 1500)
}
'''
bound_dict = {}
def random_astro_params():
astro_params = AstroParams._defaults_
for key, val in bound_dict.items():
mu, sigma = AstroParams._defaults_[key], val[1]-val[0]
X = get_truncated_normal(mean=mu, sd=sigma, low=val[0], upp=val[1])
astro_params.update({key: X.rvs()})
return astro_params
#set up parser
parser = argparse.ArgumentParser()
parser.add_argument('-l', '--box_length', type=int)
parser.add_argument('-s', '--seed', type=int)
parser.add_argument('-smm', '--sampler_min_mass', type=float)
args = parser.parse_args()
#Simulation meta parameters
n_samples = 1
n_voxels = args.box_length #512
random_seed = args.seed
smm=args.sampler_min_mass * 1e6
vox_length = 3.9
sim_volume = vox_length*n_voxels
sim_name = "smm_test"
user_params = {"HII_DIM": n_voxels, "BOX_LEN": sim_volume, "USE_INTERPOLATION_TABLES":True}
lightcone_quantities = ('brightness_temp', 'density', 'xH_box', 'halo_mass')
#Format Cache
local_dir = f'/users/jsolt/scratch'
cache_path_format = f'{local_dir}/21cmFAST_cache'
cache_path = f'{cache_path_format}'
save_path = f'/users/jsolt/data/jsolt/halo_lightcones_bigmem_{sim_name}'
if not os.path.exists(save_path):
os.mkdir(save_path)
if not os.path.exists(cache_path):
os.mkdir(cache_path)
p21c.config['direc'] = cache_path
#run sims
start_time = datetime.now()
flag_options = p21c.inputs.FlagOptions(
USE_HALO_FIELD=True, #this makes it DexM
HALO_STOCHASTICITY=True, #this uses the new beta branch halo sampler
USE_MASS_DEPENDENT_ZETA=True
)
astro_params = random_astro_params()
with global_params.use(sampler_min_mass=smm):
#initial conditions
print("Initial Conditions")
init_box = p21c.initial_conditions(
user_params=user_params,
random_seed=random_seed
)
'''
#testing halo sampler
print("Halo Sampler")
halolist_init = p21c.determine_halo_list(redshift=6.0,
init_boxes=init_box,
astro_params=astro_params,
flag_options=flag_options,
random_seed=random_seed)
print(halolist_init.buffer_size)
print(halolist_init.n_halos)
'''
lightcone = p21c.run_lightcone(
redshift = 6.0,
max_redshift = 16.0,
astro_params = astro_params,
flag_options = flag_options,
lightcone_quantities = lightcone_quantities,
global_quantities = lightcone_quantities,
init_box=init_box,
)
lightcone.save(direc=save_path)
end_time = datetime.now()
print(f"*** TOTAL TIME: {end_time - start_time}***")
edit: apologies, this whole time I've been saying box length when I meant HII dim
I was able to gain access to the large memory nodes on Brown's compute cluster to test larger box sizes with smaller 'SAMPLER_MIN_MASS' values. At SAMPLER_MIN_MASS=1e8 and HII_DIM=256, still no halos are being found (see attachment) Sampler min mass 1e8.pdf
I am testing this but be aware that with a cell size of 4 Mpc and a HII_DIM
of 256, so your box is above 1Gpc on a side and your catalogues will be huge with such a low minimum mass. If it's a memory issue I would naively expect this to crash immediately when it tries to allocate a box but this may have different behaviour depending on your system. Can you try the following steps:
DEBUG=TRUE
and examine the logs, there should be a message from the sampler with the expected memory usage of a single catalogue.buffer_size
attribute of the catalog objects, this is the size of each array it allocates (mutliply by 6 for the full catalog size: 3xposition, mass, stars, and sfr)global_params.MAXHALO_FACTOR
(this is a safety factor for array allocation which increases memory usage). if you need the smaller halos there is a global parameter AVG_BELOW_SAMPLER
which effectively puts the average amount of halos below your sampler resolution into each cell in the HaloBox
without putting them in HaloField
or PerturbHaloField
I've done the last two things. It does crash when trying to allocate too much memory, and I have a decent understanding of how to manage that by now.
The issue is, values of minimum mass that produce reasonable halo fields on smaller boxes fail to produce any halos on larger boxes. I'm not sure how this could be a memory issue, given that it DOES fail when I attempt to run lightcones with too high a box size / too low a minimum halo mass. See the pdf I attached to the original issue for a visual summary of the problem.
The linked pull request fixes some indexing issues with very large halo catalogues, I have run a test with your script above at HII_DIM
of 64, 128, 192 and 256 to verify that the halo boxes are not zero everywhere. Due to the memory requirements I was not able to do more extensive testing, so I will leave this open for a little while so you can verify that this fixes your specific issue.
Will do. Testing might take me a bit but I'll attempt to record what I find and post here.
Ok, I've since been able to test running a box with an HII_DIM
of 128 and 192 for a lightcone with SAMPLER_MIN_MASS = 1.5e8
on the bigmem nodes with 1.5Tb of available memory. HII_DIM=128
runs fine while HII_DIM=192
segfaults. This is not an issue of not enough memory, as my job report states it only used around 130Gb of the 1.5Tb available.
It's definitely not running out of memory, ~300GB was what I used with that box size. Since it's running on my end I'll need a bit more info. Can you run with the debug logs on (compile with either DEBUG=TRUE
or LOG_LEVEL=DEBUG
) and see where it crashes?
One other possibility is the caching, if you generated a HaloField
with the previous version, it might still be loading in the corrupted boxes, if you have the same parameter set. You can either clear the cache or pass regenerate=True
into run_lightcone
to avoid reading from the cache.
Sorry--how do I compile with DEBUG=TRUE
? I'm using the python wrapper and am not very familiar with C.
No worries. You set the environment variable in your terminal when calling your install script e.g DEBUG=TRUE pip install ...
or DEBUG=TRUE conda install ...
. This turns on the logging statements from the C backend, as well as turning off some math optimisations. If you just want the logs use LOG_LEVEL=DEBUG
instead of DEBUG=TRUE
Additionally, there are logs in the python wrapper which you can access by adding the following lines at the start of the script you use to run 21cmfast.
import logging
logging.basicConfig(level=logging.DEBUG)
Awesome, thank you. Will do this
Ok, weirdly, reverting to commit e7bf5ba and turning on debug mode seems to have fixed the issue. I'm running sims with an HII_DIM=256
and SAMPLER_MIN_MASS = 1.5e8
without bugs. I'll open a new issue if this comes up again but this seems to be solved?
Hello,
I closed my previous issue as it had to do with memory, but this seems like a separate concern.
When generating lightcones, increasing the box length seems to reduce(?) the number of halos found. I discovered this while doing a parameter sweep of the global parameter 'SAMPLER_MIN_MASS' and found that settings which returned reasonable reionization histories at smaller box lengths would begin to break down at larger box lengths. Seemingly, no halos can be found once the box length exceeds a certain value, all other parameters being equal.
Attached is a doc showing lightcones generated at different values of 'SAMPLER_MIN_MASS' and box lengths. You can see that the larger boxes have a 0 halo field and no ionization history. box length halo sampler test.pdf
If you could look this over that would be appreciated.