SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
90 stars 29 forks source link

mafft alignment number of threads #64

Closed luciagrami closed 3 years ago

luciagrami commented 3 years ago

Hello,

I want to know if it is possible to set the number of threads for mafft. I am working with ~1800 genomes, and the alignment step is taking too long, and is using only one cpu, so I would like to specify --threads for mafft. Is that possible?

Thanks!

SionBayliss commented 3 years ago

Hi Lucia,

PIRATE uses GNU parallel to run individual alignments using one thread in tandem. This is a large speed improvement over running MAFFT sequentially using a large number of threads per alignment. The number of threads is the same as provided to the main PIRATE script or set manually if run the align_feature_sequences.pl and create_pangenome_alignment.pl scripts are run outside of the PIRATE pipeline. Sometimes a problematic alignment, perhaps including large numbers of truncated or duplicated ORFs, can take a long time to complete. This can give the impression that one thread being used for MAFFT when in fact multiple have been used previously (i.e. it is only that one alignment waiting to complete). If this is the case then you might want to run align_feature_sequences.pl yourself and use more stringent cutoffs for --threshold, --max-threshold or --dosage values. Note that align_feature_sequences.pl can be run multiple times with no conflict, so you could align your core or accessory genes separately based upon your needs. Also, the alignment completes after other sections of the pipeline so technically PIRATE has already finalised your other outputs and you can use or analyse these while waiting for the alignments to complete.

All the best, Sion