I kept running into memory issues with a test data set I am using. After reading the the Spades manual, for release 3.15.2 which qiime2-shotgun-2023.9 uses:
SPAdes uses 512 Mb per thread for buffers, which results in higher memory consumption. If you set memory limit manually, SPAdes will use smaller buffers and thus less RAM.
I think my issue was not realizing the increased memory usage incurred when using multiple threads. I am in the process of validating this now.
If a user specifies 32 cores, they'll be using up ~ 16 GB of RAM for buffers. This is analogous to feature-classifier, in which more memory is used with increasing thread count. Conversely, the user may specify too little memory to get anything to run. For example, setting the maximum memory usage to 100 GB and using 16 threads, means much smaller buffers / RAM per thread.
Perhaps update the help text like so:
--p-threads: Number of threads. By default SPAdes uses 512 Mb per thread for buffers, which results in higher memory consumption. This can be further affected by the --p-memory option.
--p-memory: RAM limit for SPAdes in Gb (terminates if exceeded). If a smaller memory limit is set, SPAdes will use smaller buffers and thus less memory per --p-threads.
Is it easier for everyone to post these types of suggestions as an issue like this, or should I simply wait and compile a set of these suggestions and then and submit them as PR? I've not dived into the code yet, so I figured I'd recommend these simple fixes as I work through testing the tools. I'd imagine that these are easy enough to wrap into any other existing PRs.
I kept running into memory issues with a test data set I am using. After reading the the Spades manual, for release 3.15.2 which
qiime2-shotgun-2023.9
uses:I think my issue was not realizing the increased memory usage incurred when using multiple threads. I am in the process of validating this now.
If a user specifies 32 cores, they'll be using up ~ 16 GB of RAM for buffers. This is analogous to
feature-classifier
, in which more memory is used with increasing thread count. Conversely, the user may specify too little memory to get anything to run. For example, setting the maximum memory usage to 100 GB and using 16 threads, means much smaller buffers / RAM per thread.Perhaps update the help text like so:
--p-threads
: Number of threads. By default SPAdes uses 512 Mb per thread for buffers, which results in higher memory consumption. This can be further affected by the--p-memory
option.--p-memory
: RAM limit for SPAdes in Gb (terminates if exceeded). If a smaller memory limit is set, SPAdes will use smaller buffers and thus less memory per--p-threads
.Is it easier for everyone to post these types of suggestions as an issue like this, or should I simply wait and compile a set of these suggestions and then and submit them as PR? I've not dived into the code yet, so I figured I'd recommend these simple fixes as I work through testing the tools. I'd imagine that these are easy enough to wrap into any other existing PRs.