Closed ritase7e closed 2 years ago
Hi Rita,
You might want to check if your parameters are leaving the valid range of CLASS since they're not restricted by prior limits. Normally that's not a problem for runs including Planck, but since you're adding A_L I'm not sure if that could happen. At the very least I would expect to need to add a lower limit on A_L > 0, but that might be enough.
Best, Thejs
Hi Thejs,
Thank you so much for your quick reply. It seems like that was indeed the issue! I have added a lower limit on A_L > 0 and it has been running for 1h40m now, which is more than it ever did before.
I wonder why this was never an issue with less parameters even when A_L was included, but it definitely seems to have fixed it!
I'll close this issue now, if it turns it is not fixed I'll reopen.
Thanks again! Rita
Hello,
I am trying to run chains with the following 'cosmo' parameters in my .param file:
This is just the 6 LCDM paramters plus A_L, a lensing parameter that is already defined in CLASS with that name. I have not modified CLASS.
In runs where I have only 6 of these 7 parameters in any combination of them I have no issue, it does not complain about any of the parameters or anything else. However in runs with all 7 parameters it runs for at most an hour (sometimes way less) and then produces a segmentation fault:
/var/spool/slurm/job07053/slurm_script: line 15: 21290 Segmentation fault (core dumped)
It seems to run fine until that point as the txt is being filled while it runs. That is the only error it produces, so I don't have anymore clues as to what might be causing it.
If I run with mpirun the error is:
mpirun noticed that process rank 3 with PID 19178 on node ftlab21 exited on signal 11 (Segmentation fault).
(maybe there is some extra information here that could be a clue for someone).
Finally if I try to restart the chain then it only computes for a few minutes before giving the same error again. Running a new chain from the same folder (so that it starts from the log.param) most of the times does the same. However, if I run again from the original .param file into a new output folder it is able to run again for about an hour (or less but significantly more time than when I try in the same folder).
I am running on a cluster, in case that is relevant, and I have asked the IT person for help and he has checked that it is not a problem with the machine (by running on different machines) or with memory/ processing power, by monitoring these as the process runs.
Any help would be appreciated.
I'll leave bellow the full .param file in case there is something there or anyone is willing to run this and see if it produces the same error (it is just the base2018TTTEEE_lensing.param with the addition of 'A_L').
Thanks in advance, Rita Neves
Edit to add that I tried running again today and the error now gives a little more information (don't know why), which might mean something:
------------------------ .param file I am using ------------------------