21cmfast / 21cmFAST

Official repository for 21cmFAST: a code for generating fast simulations of the cosmological 21cm signal
MIT License
56 stars 37 forks source link

[BUG] NaNs in brightness_temp #400

Open Kojobu opened 1 month ago

Kojobu commented 1 month ago

Describe the bug: When running the simulator with a specific set of parameters, there are singular NaN values. When slightly alternating one of the parameters, the NaNs disappear. Furthermore, the NaN values are exactly the HII_DIM size away from each other in the z dimension. E.g. the minimal working example from below yields NaNs at

x = array([26, 26, 43, 43, 43]) 
y = array([50, 50,  0,  0,  0])
z = array([ 88, 208,  38, 158, 278]))

Note, that this issue does not occur if USE_TS_FLUCT is set to false.

To Reproduce:

import py21cmfast as p21c

user_params = p21c.UserParams(
    HII_DIM=120, BOX_LEN=200, NO_RNG=True
)
flag_options = p21c.FlagOptions(
    INHOMO_RECO=True, USE_TS_FLUCT=True
)

astro_params = p21c.AstroParams(
    **{'ION_Tvir_MIN': 4.798687324224536, 
    'HII_EFF_FACTOR': 153.93554078107047, 
    'L_X': 41.07914816577325, 
    'NU_X_THRESH': 122.31064697905447}
)

lightcone = p21c.run_lightcone(
    direc='_cache',
    user_params=user_params,
    flag_options=flag_options,
    astro_params=astro_params,
    redshift = 5.5
)

print(np.where(np.isnan(lightcone.brightness_temp)==True))

Please note that the brightness_temp does not look physical due to NO_RNG = True. It is only set to True for the minimal working example given above. This is a necessary condition to reproduce the error. Although setting the random_seed argument, the brightness_map will differ from run to run if NO_RNG = False. Furthermore, the crash parameters are not unique. There are other combinations for which the error occurs. They may differ radically and show no obvious coherence.

Expected behavior: A brightness map without NaNs or at least a warning that this particular choice of parameters causes trouble.

Details: OS: Linux (behavior is reproducible on local hardware and cluster) Python version: 3.12.2 and 3.11.2 Package version: pip version from pypi

Additional context The error occurs during MCMC inference. Each step, the four astro parameters are sampled uniformly and the simulator is called as a sample from the likelihood. The ranges from which the parameters were sampled are:

ON_Tvir_MIN: [4,5.3]
HII_EFF_FACTOR: [10, 250]
L_X : [38,42]
NU_X_THRESH: [100, 1500]
steven-murray commented 1 month ago

Thanks @Kojobu for the detailed report. I will try to reproduce.

steven-murray commented 1 month ago

I have reproduced the issue with the given script, thank you! Actually, with those parameters I do get NaNs, but only two of them (not 5).

I turned on LOG_LEVEL=DEBUG when building the package, and I found that the NaN's are only in the TsBox.Ts_box, and only enter between redshifts 6.7681003887088105 and 6.6157848372239005.

I have not dug any deeper at this point. I wonder if @daviesje might be able to help here as well.

daviesje commented 2 weeks ago

@steven-murray I'm not 100% sure if this is the issue, but something I've encountered before is that the density field can contain values of exactly -1. See PerturbField.c

https://github.com/21cmfast/21cmFAST/blob/aefc27314b0603d2523808502833189c6e2169f0/src/py21cmfast/src/PerturbField.c#L595-L597

Since there's a strict inequality in the conditional, it can miss some values. Also later on in the same file:

https://github.com/21cmfast/21cmFAST/blob/aefc27314b0603d2523808502833189c6e2169f0/src/py21cmfast/src/PerturbField.c#L654-L660

Density of -1 results in kinetic temperature being zero, and spin temperature being NaN. In v4-prep I have a fix in SpinTemperature.c, but I think if this is what causes the issue the best fix is to change the two above srict inequalities to <=.

steven-murray commented 2 weeks ago

Sounds good, thanks @daviesje. I'll go and implement those changes, and see if it fixes the problem

steven-murray commented 2 weeks ago

Hmmm, this didn't seem to fix the problem.