ICB-DCM / pyABC

distributed, likelihood-free inference
https://pyabc.rtfd.io
BSD 3-Clause "New" or "Revised" License
205 stars 44 forks source link

RAM usage grows without bound when using `pyabc.sampler.SingleCoreSampler()` #626

Open Gabriel-p opened 10 months ago

Gabriel-p commented 10 months ago

Bug description When I use pyabc.ABCSMC()with sampler=pyabc.sampler.SingleCoreSampler() the RAM usage will some times grow until all available RAM is consumed. This happens rarely but I tested it enough times to reproduce it. The issue goes away if I use instead sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)

Script with sampler=pyabc.sampler.SingleCoreSampler()

Captura de pantalla de 2024-01-16 10 37 50

Exact same script but using sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)

Captura de pantalla de 2024-01-16 10 38 11

Expected behavior Not use all the RAM.

To reproduce I can't, my scrip is very large and it also does not happen all the time.

Environment

Name: pyabc
Version: 0.12.13
Summary: Distributed, likelihood-free ABC-SMC inference
Home-page: https://github.com/icb-dcm/pyabc
Author: The pyABC developers
Author-email: yannik.schaelte@gmail.com
License: BSD-3-Clause
Location: /home/gabriel/miniconda3/envs/asteca/lib/python3.12/site-packages
Requires: click, cloudpickle, distributed, gitpython, jabbar, matplotlib, numpy, pandas, redis, scikit-learn, scipy, sqlalchemy
Required-by:
/home/gabriel/miniconda3/envs/asteca/bin/python
Python 3.12.0

elementary OS 7.1 (based on Ubuntu 22.04.3 LTS); Linux 6.5.0-14-generic

stephanmg commented 10 months ago

Thanks @Gabriel-p for reporting this.

So are you saying you cannot provide the script for us to test to reproduce the results? It would be good to confirm it on another installation.

Gabriel-p commented 10 months ago

Let me see if I can clean it up and reduce the number of files to the minimum required

Gabriel-p commented 10 months ago

Ok, here's the compressed file with everything needed to reproduce the issue. You'll need a conda environment with:

python 3.12.0
pyABC 0.12.13
numpy 1.26.2
scipy  1.11.13
astropy 5.3.4
pandas 2.1.1
fastparquet 2023.10.1
fast_histogram 0.12

Then you just run the test_pyABC.py script changing the lines 90 & 91 to switch between samplers.

Let me know if something does not work.

pyABC_test.zip

stephanmg commented 10 months ago

Ah, perfect, we will have a look at this.

stephanmg commented 10 months ago

At @Gabriel-p I can't reproduce your issue here, what is the frequency of this error happening?

Gabriel-p commented 10 months ago

Hi @stephanmg, I think I sent the files improperly packaged, not sure if you could manage to run the test_pyABC.py if not let mo know.

I can reproduce the issue 100% of the times, even after restarting the system.Another thing I've noticed is that sometimes the script keeps running in the background even after I close my IDE (Sublime Text)

stephanmg commented 10 months ago

Yes, please re-package if possible and I will give it another try. Thanks for your patience.

Gabriel-p commented 10 months ago

Now it should work pyABC_test.zip

stephanmg commented 10 months ago

Hi @Gabriel-p I can't reproduce it here, I will also assign @arrjon to check the issue.

Gabriel-p commented 10 months ago

Ok, I can still reproduce this issue 100% of the times so let me know what I can do to help

arrjon commented 9 months ago

I checked it now on MacOS, and it seems like SingleCoreSampler() is opening more threads than it should. This might explain your issue and seems to be a bug. Using MulticoreEvalParallelSampler(n_procs=1) works as expected.

stephanmg commented 9 months ago

Hi @Gabriel-p,

could you show the content of OMP_NUM_THREADS, e.g. echo $OMP_NUM_THREADS.

stephanmg commented 9 months ago

... and could you try the branch fix_singlecore, and let me know if it works?

Gabriel-p commented 9 months ago

echo $OMP_NUM_THREADS returns nothing.

This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1):

ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Stop: Maximum walltime.
ABC INFO: Stop: Maximum walltime.
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>

It appears to be running the sampler twice? The RAM usage stays low as expected.

This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.SingleCoreSampler():

ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
....

The RAM usage immediately starts climbing.

stephanmg commented 9 months ago

Thanks for the information @Gabriel-p - we are currently still troubleshooting the issue. We will push the fix, when it's ready, to the fix_singlecore branch for you.

stephanmg commented 8 months ago

@Gabriel-p might be related to this issue: https://github.com/ICB-DCM/pyPESTO/issues/1312

Could you please try again the fix_singlecore branch?

Gabriel-p commented 8 months ago

@stephanmg just tested the fix_singlecore branch, the issue is still there

stephanmg commented 8 months ago

Thanks for testing so quickly, hoped the issue would go away in light of this. However, seems that we need to dig deeper.