Open eatdust opened 3 years ago
Probably https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94109 ? Have you tried other gcc versions?
I have just checked, gfortran-10.3.0 has the fix of #94109 included. Plus, I do not see any leak on CAMB, which suggests that #94109 is fixed for good, hopefully. I'll try to reduce it though, it is not very clear to me where it gets explicitly or implicitly freed.
OK thanks. It should be freed in TTheoryIntermediateCache. CAMBdata stored in info only has allocatable components which should be freed automatically (in recent CAMB versions).
Making progress, Info is in fact already associated when entering the CAMBCalc_GetNewTransferData
routine the first time. This could explain the leak, the explicit allocate then create a new copy of the Info pointer. So, if I remove completely the allocation, everything works fine as well.
Could it be that it is already allocated in CAMBcalc_SetCurrentPoint
?
Hmm not sure. The idea is that when sampling, you make a new point by coping the current one. This new point has a copy of the original data pointer. So in currentPoint, except at the start, Info should already be assigned to the same info as the old point. GetNewTransferData should then make a new instance for the new point's calculation. But it should then be freed consistently if the point is rejected. Do you find memory grows indefinitely?
Do you find memory grows indefinitely?
Exactly! And quite fast, after a thousand calls to CAMB, this is game-over :)
Can you reproduce e.g. with ifort or other gfortran version? I would have though a general issue like that would have been raised long ago (though I've not been using cosmomc myself).
Yes, with gfortran-8.5.0. I was able to reproduce. With the unstable branch of gfortran-11.1.1 20210612, that's actually untestable as I am getting a SIGSEGV at start. This is unrelated though, the SIGSEV error in __fileutils_MOD_writeitemtxt (../camb/forutils/FileUtils.f90:1572).
With ifort 2021.2.0 20210228, it looks fine, memory grows for a while and then it is freed.
I'll be happy to add an explicit deallocate() of the leaking pointer for gfortran, but it is really not clear where it should be freed!
Hmm, I wonder if a new gfortran memory leak bug has been introduced.
The CAMB mem test passes with trunk GNU Fortran (GCC) 12.0.0 20210613 https://travis-ci.com/github/cmbant/CAMB/jobs/515550950 so probably not directly a problem with CAMB dynamic allocatables.
If it were a CosmoMC leak, it should presumably also be there with ifort.
Yes, I agree, I've passed CAMB under valgrind, it is clean with gfortran.
I'll try to understand in more details what is going on with this Info pointer. Removing the allocate stops the leak and I do not see any difference in the behaviour of the chains, with or without. It seems that gfortran makes a spurious extra copy each time its encounter this allocate statement. Tough to solve.
I certainly can't reproduce a gfortran issue on trunk in simple test cases. Perhaps count calls to deallocate and allocate of Info and see if getting imbalanced?
I don't see any very obvious memory issue in 8.5.0 using defaults with test.ini changed to action=0.
Thank you for looking at it! I've just made the same test, with action=0 on test.ini and with 10.3.0, it looks also fine, the memory remains stable!
In my runs, I was using tensors and plik, so I've made another test with test.ini (and also test_planck.ini) having uncommented
compute_tensors = T
param[r] = 0.03 0 2 0.04 0.04
And there, I do see the memory growing slowly (compiled with -fopenmp and running on 48 cores, no MPI). It grows slow, but once used with -DMPI, it became rapidly problematic on mpirun -np 64.
Anyway, good catch, I'll try to narrow it down if this is really related to tensors.
No sure why tensors could be leaking memory - did you find anything else?
Why 64 processes - sounds way too much, or is this using polychord plugin or something else?
I've been busy and did not have time to make more progress, but I will.
For 64, I am not looking for getting margestats only, I generate millions of sample to train a NN for other matters, and multiplying the number of chains, and running cosmomc a long time, is doing the job very well, provided I can keep the memory under control :)
Hi there, tested under gfortran-10.3.0, this "Info" pointer is leaking at each Cls evaluation.
From
Calculator_CAMB.f90
:``
``
A possible fix is to keep the same all along the runs with:
if (.not.associated(Info)) allocate(Info, source=this%DefaultInstance)
or, maybe, writing an explicit deallocation in the Clear procedure, which is currently empty?
In
GeneralType.f90
``
``
Dunno what is best!