Closed BellaNasirudin closed 5 years ago
Hi Bella,
This reminds me that I still need to add parameter ranges to the documentation!
For ION_Tvir_MIN, you should set the range to be [4.0,6.0]. It can go below 4.0, but generates a discontinuity for Tvir as it switches gas types for halo mass. Selecting this range above would be consistent with all other works of ours that use ION_Tvir_MIN.
@BradGreig along with adding parameter ranges to documentation, we should also investigate this specific case and return a reasonable exception from the C code if it's something that we know will error. This will allow the MCMC sampler to catch it, return -inf and continue, rather than crashing.
i.e. this is related to #19
Just to add to this, in case our prior ranges are equally flawed.
I have some students running the higher parameterisation (i.e. that of Park+ 2018 but without spin temp) and are experiencing a similar problem. Our params dictionary looks like this:
FitParams = dict(F_STAR10 = [-1.3, -3.0, 0.0, 0.1], ALPHA_STAR = [0.5, -0.5, 1.0, 0.05], F_ESC10 = [-1.0, -3.0, 0.0, 0.1], ALPHA_ESC = [-0.5, -1.0, 0.5, 0.05], M_TURN = [8.7, 8.0, 10.0, 0.1], t_STAR = [0.5, 0.0, 1.0, 0.05]).
With 8 threads the code consistently crashes after ~40 iteration with it sometimes returning the same exception as described above. However, they have observed that if they run in iterations batches of 30 continuing sampling, then it stalls much less regularly. I wonder if there might be a memory leak issue of some sort on top of the discontinuity issues in some regions of parameter space? Apologies if such a memory leak issue has already been raised elsewhere.
@caw11 I did find a memory leak issue a while back, and I think I fixed it (see https://github.com/BradGreig/Hybrid21CM/issues/29#issuecomment-476333842). However, with these kinds of issues it can be hard to be certain. What your students are getting certainly smells like a memory leak. There's a script in the devel/ directory called memory_leak_test.py which you could try running with a bit of modification to suit your purposes, to check if there's a memory leak. If you get the time to do that, definitely let us know the outcome!
I have been (still am) experiencing memory leak issue as well. I am using this version commit 450d0966cd8b92beae0052f41466dd605da313ae
In my case, it seems to come from the storage module in 21cmmc since when I opted out of doing that, the code ran through perfectly fine and only used 100 GB. Of course, the downside is that I cannot analyse the walkers if there is something wrong with the MCMC and will have to run everything again.
With the storage module included, I am using at least 915 GB so it's very computationally expensive.
@BellaNasirudin thanks for the report. This seems to be a different issue than the one you originally posted here. Can you file a separate issue for it, and create a minimum working example? When you say "storage" module, do you mean storing the arbitrary derived data?
@caw11 your original issue has moved to https://github.com/21cmfast/21cmFAST/issues/16.
@caw11 and @BellaNasirudin the memory leak issue has been moved to https://github.com/21cmfast/21CMMC/issues/4
I tried to run a 4 parameter run with values:
But I am getting this error:
which is caused by one of these parameters:
I then ran Hybrid21cm again with these parameters and got this error for the values in bold above: