dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

Tetrad cannot allocate memory #461

Closed alexkrohn closed 2 years ago

alexkrohn commented 2 years ago

Is there a way to limit memory usage for tetrad? When I try to run tetrad on large numbers of quartets (50e6 of 74e6 total), I get this error after ~8 hours into estimating the full tree (with an average of 3645 quartets per tree). I'm running this on a server with 125 GB of RAM (SSE4.2) and 80 cores running Ubuntu 16.04.

Thanks,

Alex

Error:

Encountered an Error.
Message: OSError: [Errno 12] Cannot allocate memory
Use debug flag (-d) for full code traceback.
Error: ipcluster shutdown and must be restarted
isaacovercast commented 2 years ago

Did you try reducing the number of cores? This will reduce the number of running jobs and by extension increase the amount of RAM available to each job. Try cutting the number of cores in half. It'll run longer but at least it won't crash. There's not really a way to limit memory usage for any given job, so limiting the number of concurrent jobs is the only way.

alexkrohn commented 2 years ago

I didn't see that flag/option in the cookbook https://ipyrad.readthedocs.io/en/master/API-analysis/cookbook-tetrad.html, so I wasn't sure it existed. What is the command? I'm running this interactively in Jupyter Notebook. Thanks!

On Thu, Sep 16, 2021 at 9:16 AM Isaac Overcast @.***> wrote:

Did you try reducing the number of cores? This will reduce the number of running jobs and by extension increase the amount of RAM available to each job. Try cutting the number of cores in half. It'll run longer but at least it won't crash. There's not really a way to limit memory usage for any given job, so limiting the number of concurrent jobs is the only way.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/461#issuecomment-920893231, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADAEIHCC5DGE42TZJZGCAOTUCHU2DANCNFSM5EEXIQPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

isaacovercast commented 2 years ago

Yeah, if you follow the notebook and use the tet.run(auto=True) this will auto-launch an ipcluster instance and use all cores by default. If you don't want this or if you want to control the number of cores you need to do something like this:

On the command line, launch an ipcluster instance with 40 cores:

ipcluster start -n 40 --cluster-id=tetrad --daemonize

In the notebook tell tetrad to use your externally launched ipcluster:

import ippyparallel as ipp
ipyclient = ipp.Client(cluster_id="tetrad")
tet.run(ipyclient=ipyclient)
alexkrohn commented 2 years ago

Thanks for that. It's running now -- let's see if it runs out of memory again.

alexkrohn commented 2 years ago

Awesome! It seems to be running. Now, at 8 hours per bootstrap, and 100 bootstraps, it will only be 31 days until it has analyzed this 30 million quartet subset of my dataset 😅 I'll go ahead and close this now.