Open Gabriel-p opened 10 months ago
Thanks @Gabriel-p for reporting this.
So are you saying you cannot provide the script for us to test to reproduce the results? It would be good to confirm it on another installation.
Let me see if I can clean it up and reduce the number of files to the minimum required
Ok, here's the compressed file with everything needed to reproduce the issue. You'll need a conda
environment with:
python 3.12.0
pyABC 0.12.13
numpy 1.26.2
scipy 1.11.13
astropy 5.3.4
pandas 2.1.1
fastparquet 2023.10.1
fast_histogram 0.12
Then you just run the test_pyABC.py
script changing the lines 90 & 91 to switch between samplers.
Let me know if something does not work.
Ah, perfect, we will have a look at this.
At @Gabriel-p I can't reproduce your issue here, what is the frequency of this error happening?
Hi @stephanmg, I think I sent the files improperly packaged, not sure if you could manage to run the test_pyABC.py
if not let mo know.
I can reproduce the issue 100% of the times, even after restarting the system.Another thing I've noticed is that sometimes the script keeps running in the background even after I close my IDE (Sublime Text)
Yes, please re-package if possible and I will give it another try. Thanks for your patience.
Now it should work pyABC_test.zip
Hi @Gabriel-p I can't reproduce it here, I will also assign @arrjon to check the issue.
Ok, I can still reproduce this issue 100% of the times so let me know what I can do to help
I checked it now on MacOS, and it seems like SingleCoreSampler()
is opening more threads than it should. This might explain your issue and seems to be a bug. Using MulticoreEvalParallelSampler(n_procs=1)
works as expected.
Hi @Gabriel-p,
could you show the content of OMP_NUM_THREADS, e.g. echo $OMP_NUM_THREADS
.
... and could you try the branch fix_singlecore
, and let me know if it works?
echo $OMP_NUM_THREADS
returns nothing.
This is the output to screen with the fix_singlecore
branch and sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)
:
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Stop: Maximum walltime.
ABC INFO: Stop: Maximum walltime.
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
It appears to be running the sampler twice? The RAM usage stays low as expected.
This is the output to screen with the fix_singlecore
branch and sampler=pyabc.sampler.SingleCoreSampler()
:
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
....
The RAM usage immediately starts climbing.
Thanks for the information @Gabriel-p - we are currently still troubleshooting the issue. We will push the fix, when it's ready, to the fix_singlecore
branch for you.
@Gabriel-p might be related to this issue: https://github.com/ICB-DCM/pyPESTO/issues/1312
Could you please try again the fix_singlecore
branch?
@stephanmg just tested the fix_singlecore
branch, the issue is still there
Thanks for testing so quickly, hoped the issue would go away in light of this. However, seems that we need to dig deeper.
Bug description When I use
pyabc.ABCSMC()
withsampler=pyabc.sampler.SingleCoreSampler()
the RAM usage will some times grow until all available RAM is consumed. This happens rarely but I tested it enough times to reproduce it. The issue goes away if I use insteadsampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)
Script with
sampler=pyabc.sampler.SingleCoreSampler()
Exact same script but using
sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)
Expected behavior Not use all the RAM.
To reproduce I can't, my scrip is very large and it also does not happen all the time.
Environment
elementary OS 7.1 (based on Ubuntu 22.04.3 LTS); Linux 6.5.0-14-generic