AntonelliLab / seqcap_processor

Bioinformatic pipeline for processing Sequence Capture data for Phylogenetics
MIT License
21 stars 12 forks source link

Change parallel processing for assemble_reads.py #31

Closed bmichanderson closed 2 years ago

bmichanderson commented 2 years ago

The previous version of this script would create corescores and coresmemory demands on the system. This change adds an argument --instances that hopefully makes this more explicit.

bmichanderson commented 2 years ago

It seems like (based on my tests on my computer) that the current version of the script will actually create "cores" number of parallel assemblies, each using "cores" cores and "max_memory". This is not what I think makes sense, as it will go way beyond the actual "max_memory" setting and available cores. I've added an argument, "instances" that makes it more explicit. Now, the "cores" argument is per SPAdes assembly, the "max_memory" argument is per SPAdes assembly, and the "instances" determines how many the user wants to run in parallel. The actual demand on the system is "instances" "cores" and "instances" "max_memory".

It seems like this is what was perhaps going to be implemented, given "cores" and "max_memory" are passed to the spades_assembly function along with "args", but neither "cores" nor "max_memory" are used in the function (just "args.cores" and "args.max_memory"), leading to the weird behaviour.

There may be other ways to ideally fix this behaviour, so perhaps this pull request is not desired.

Cheers, Ben