There is a memory leak in the run_estimation function. It appears to happen in the for loop over different cells. Freeing memory by deleting the objects that are created within this loop did not help. Up to 100 mb is stored additionally for each iteration of the loop. This is a bug because nothing needs persist from one instance to the other.
Some further research indicated that this problem might be rooted in the Theano package [1, 2]. That might also be the reason it did not come up in the updatet PYMC version.
An open question is why this just came up now and not in earlier runs of the ATTRICI-PYMC3 code. I believe the reason is that we did not run so many cells within a single cell such that memory was never filled.
A quick fix is to not run too many cells within one node. However this is not a permanent solution, because it is quite likely that others who want to use ATTRICI, run into the same problem. Likely they do not have the same setup with a "standby queue" where parallelization on many independent jobs is possible.
A proper solution might be to setup the pymc model only once and reuse it for the next instances [see discussion in 1]. This might also give a considerable improvement in performance because the model does not need to be recompiled.
Maybe this is also why the code is faster in the updated pymc because the compilation step is reduced. But this is just speculative for now.
There is a memory leak in the run_estimation function. It appears to happen in the for loop over different cells. Freeing memory by deleting the objects that are created within this loop did not help. Up to 100 mb is stored additionally for each iteration of the loop. This is a bug because nothing needs persist from one instance to the other. Some further research indicated that this problem might be rooted in the Theano package [1, 2]. That might also be the reason it did not come up in the updatet PYMC version. An open question is why this just came up now and not in earlier runs of the ATTRICI-PYMC3 code. I believe the reason is that we did not run so many cells within a single cell such that memory was never filled. A quick fix is to not run too many cells within one node. However this is not a permanent solution, because it is quite likely that others who want to use ATTRICI, run into the same problem. Likely they do not have the same setup with a "standby queue" where parallelization on many independent jobs is possible. A proper solution might be to setup the pymc model only once and reuse it for the next instances [see discussion in 1]. This might also give a considerable improvement in performance because the model does not need to be recompiled. Maybe this is also why the code is faster in the updated pymc because the compilation step is reduced. But this is just speculative for now.
I add the output of memory_profiler below
Bellow is the run_estimation.py code I used for profiling