EMBL-PKU / BASALT

MIT License
90 stars 14 forks source link

run BASALT with multiple assemblies and short reads from several samples #15

Open liupfskygre opened 7 months ago

liupfskygre commented 7 months ago

Hi, I am trying to run BASALT on several assemblies from different but similar samples. Say: sample 1, with assembly1.fa SR1_r1.fq,SR1_r2.fq sample 2, with assembly2.fa SR2_r1.fq,SR2_r2.fq and on pacbio-hifi long reads file hc1.fq

as stated "You may put as many assemblies as you have, and as many SR or LR datasets as you have"

BASALT -a assembly1.fa,assembly2.fa -s SR1_r1.fq,SR1_r2.fq/SR2_r1.fq,SR2_r2.fq -c hc1.fq -t 60 -m 250

here I am not sure how would the mapping be carried out by bowtie2:

i am guessing it is full mapping across different assemblies and reads like: SR1_r1.fq,SR1_r2.fq --> assembly1.fa ---> 1.bam SR1_r1.fq,SR1_r2.fq --> assembly2.fa ---> 2.bam SR2_r1.fq,SR2_r2.fq --> assembly1.fa ---> 3.bam SR2_r1.fq,SR2_r2.fq --> assembly2.fa ---> 4.bam

or it is merged assembly1.fa and assembly2.fa ---> merged.assembly.fa

and generate only 2 bam files SR1_r1.fq,SR1_r2.fq --> merged.assembly.fa --->1.bam SR1_r1.fq,SR1_r2.fq --> merged.assembly.fa ---> 2.bam

then the coverage file is calculated for binner?

not sure which is the case

thanks BASALT team

noddevil4949 commented 7 months ago

Hello,

BASALT will do mapping using each assembly file on each reads file. Like your first example, it will do a full mapping across all different assemblies and reads. You will see bam files like: SR1_r1.fq,SR1_r2.fq --> assembly1.fa ---> 1_DNA-1.bam SR1_r1.fq,SR1_r2.fq --> assembly2.fa ---> 2_DNA-1.bam SR2_r1.fq,SR2_r2.fq --> assembly1.fa ---> 1_DNA-2.bam SR2_r1.fq,SR2_r2.fq --> assembly2.fa ---> 2_DNA-2.bam

Your command seems mostly correct (flag -l for long reads)

Thanks! Galaxy

EMBL-PKU commented 7 months ago

If you used hifi data, please use the command of --hifi, it is different from PacBio long-reads

liupfskygre commented 7 months ago

thanks, this is quit clear