AntonelliLab / seqcap_processor

Bioinformatic pipeline for processing Sequence Capture data for Phylogenetics
MIT License
21 stars 12 forks source link

Adjust parallel processing for assemble_reads.py #30

Closed bmichanderson closed 2 years ago

bmichanderson commented 2 years ago

It seems like (based on my tests on my computer) that the current version of the script will actually create "cores" number of parallel assemblies, each using "cores" cores and "max_memory". This is not what I think makes sense, as it will go way beyond the actual "max_memory" setting and available cores. I've added an argument, "instances" that makes it more explicit. Now, the "cores" argument is per SPAdes assembly, the "max_memory" argument is per SPAdes assembly, and the "instances" determines how many the user wants to run in parallel. The actual demand on the system is "instances" "cores" and "instances" "max_memory".

It seems like this is what was perhaps going to be implemented, given "cores" and "max_memory" are passed to the spades_assembly function along with "args", but neither "cores" nor "max_memory" are used in the function (just "args.cores" and "args.max_memory"), leading to the weird behaviour.

There may be other ways to ideally fix this behaviour, so perhaps this pull request is not desired.

Cheers, Ben

bmichanderson commented 2 years ago

sorry, that's the "assembly_spades" function, not "spades_assembly"

bmichanderson commented 2 years ago

Sorry, still figuring out GitHub pull requests. It added a bunch of other changes I've been making to the original pull request, so I'll try to make specific branches and open them.