brinckmann / montepython_public

Public repository for the Monte Python Code
MIT License
93 stars 77 forks source link

Restarting montepython producing binary chain files #314

Closed Amlan1996 closed 9 months ago

Amlan1996 commented 1 year ago

Hi,

I tried to restart the chains using covmat and bestfit file after an initial run using the param file 'base2018TTTEEE.param'. Here is the code I used in the job file :

mpirun -np 16 python montepython/MontePython.py run -p chains/planck_new/log.param -o chains/planck_new -c chains/planck_new/planck_new.covmat -b chains/planck_new/planck_new.bestfit --superupdate 50 -N 600000 -r chains/planck_new/2023-02-16_600000__1.txt

but after a few time when I checked the newly generated chain files I found this:

[amlan@nova planck_new]$ less 2023-02-19_1200000__12.txt
"2023-02-19_1200000__12.txt" may be a binary file.  See it anyway?

As a result, when I try to analyze the chains after a run I am not getting any definitive result, instead I am getting this kind of bizarre output:

......0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x006

Also at my first initial run I got an error from class even though I didn't do any modifications to it. Here is the error:

Error in Class: perturbations_init(L:1006) :error in perturbations_solve(ppr, pba, pth, ppt, index_md, index_ic, index_k, pppw[thread]);
=>perturbations_solve(L:3296) :error in perturbations_vector_init(ppr, pba, pth, ppt, index_md, index_ic, k, interval_limit[index_interval], ppw, previous_approx);
=>perturbations_vector_init(L:4386) :condition (ppw->approx[ppw->index_ap_tca] == (int)tca_off) is true; scalar initial conditions assume tight-coupling approximation turned on

So it would be great if anyone could help me resolve this problem.

Thanks in advance.

Best regards, Amlan

Amlan1996 commented 1 year ago

Dear @brinckmann ,

It would be great if you could help me with this one like last time.

Best regards, Amlan

brinckmann commented 1 year ago

I have genuinely no idea what is going on here. Maybe the CLASS error is causing something really strange to happen? But I can't really see how. Maybe you can do some testing by trying again to see if the file is already corrupted before the restart, gets corrupted when you restart and the chains are copied, or if it happens later and if so when?

Best, Thejs

brinckmann commented 1 year ago

Also, as a workaround if it's just one chain that has a problem, you can analyze the other chains without the one with a problem. Probably the easiest is to make a new directory and copy the log.param and all of the good chains to there. If you need to run for longer you can try to rename the files so they're in numerical order, e.g. 1.txt through 15.txt and then run with only 15 chains.

Best, Thejs

Amlan1996 commented 1 year ago

Before restarting, the files are not getting corrupted because, after the first run, the analysis of the chains runs very smoothly and gives all the results and plots. The problem starts as soon as I begin restarting the chains. After the restart, most of the chain files, if not all, become binary files. This happens just after the restart. Though I am not an expert, I think that after the restart, the new chain files start copying all the output from their respective files from their previous run. During this copying process, something is happening, making them produce a binary file.

So if this copying process actually happens then I don't know maybe modifying this restarting and copying process in the montepython code would solve my problem.