multiple threads for mapping multiple species

Yuqia commented 8 months ago

Hello,

I'd like to use multiple threads for mapping multiple species on clusters using the option --single_mapping
but I don't understand what should be in the "Singe species file allowing to map in a job array" means.

Could you please give an example with the command line?

Thank you very much in advance!

sinamajidian commented 8 months ago

Dear @Yuqia Thanks for contacting us. For running read2tree on cluster with multi-threads and multi-species, you can consider each following line as a job:

read2tree --standalone_path marker_genes --output_path output --reference --dna_reference  dna_ref.fa 
read2tree --standalone_path marker_genes --output_path output --reads species1_R1.fastq species2_R2.fastq  --threads  30
read2tree --standalone_path marker_genes --output_path output --reads species2_R1.fastq species2_R2.fastq  --threads  30
read2tree --standalone_path marker_genes --output_path output --reads species3_R1.fastq species3_R2.fastq  --threads  30
read2tree --standalone_path marker_genes --output_path output --merge_all_mappings --tree

Once the first one is finished, the line for each species can be submitted as a separate job. After finishing all species, you can run the last line to infer the tree including all species.

We'll improve the documentation for that option. Please let us know whether this answers the question.

Best regards, Sina

Yuqia commented 8 months ago

Dear Sina,

Many thanks for your reply. Does this mean that each species is mapped sequentially using multiple threads instead of multiple species are mapped simultaneously using 1 thread per species?

Best, Yuqia

sinamajidian commented 8 months ago

You're welcome. Each species is mapped separately using multiple threads.

Yuqia commented 8 months ago

Thanks a lot Sina! All is clear.

DessimozLab / read2tree

multiple threads for mapping multiple species #49