adigenova / wengan

An accurate and ultra-fast hybrid genome assembler
GNU Affero General Public License v3.0
84 stars 14 forks source link

Recommendation for Illumina + PacBio HiFi + ultralong ONT? #40

Closed dhoconno closed 3 years ago

dhoconno commented 3 years ago

Hi there,

Congratulations on a fantastic, useful tool. I have high coverage ultralong ONT (~70x), Illumina WGS (30x), and PacBio HiFi (currently ~10x). When I run in ccsont mode with just the ONT and HiFi reads, I get pretty good results. Is there a way to utilize all three types of data to potentially improve the assemblies further, perhaps by reducing homopolymer indels a bit through use of the Illumina reads? When I try to pass all three types of data to wengan using the --ccsont preset, it seems like the Illumina reads are ignored.

Thanks!

adigenova commented 3 years ago

Hi dhoconno, Thanks! and glad that wengan has been useful for your genome. Regarding your question, it might be possible to include the short-read data because the current version of the --ccsont pipeline, use a multi-kmer approach with the following k-mer sizes : 41,81,121,161,201,251,301,351 ; thus the short-reads can be used in the high-quality read assembly step up to k=121 for instance (or up to k=201 if your short-reads are 251).
At the moment this is not included in the pipeline and as you said the short-read are being ignored. A quick alternative is to edit the makefile that Wengan generate to control the assembly execution.

Let's said that the wengan command is the following:

perl wengan.pl -x ccsont -a M -s lib1.fwd.fastq.gz,lib1.rev.fastq.gz -l ont.fastq.gz -p asm1 -t 20 -g 3000 -n -b hifi.fastq.gz
# the -n is important because just create the makefile without executing it, is like a preview of the command that wengan will exec.

then, it will generate the makefile asm1.mk, that you can edit as follow:

.DELETE_ON_ERROR:
#Wengan automatic generated makefile
asm1.ccs.ec.fa : 
        zcat hifi.fastq.gz  |  /Users/adigenova/Git/wengan/bin/seqtk seq  -l 60 -A -C -  > asm1.ccs.ec.fa

asm1.minia.41.contigs.fa : asm1.ccs.ec.fa
        @echo asm1.ccs.ec.fa  >  asm1.minia_reads.41.txt
        #here we add the short-read data
        @echo lib1.fwd.fastq.gz  >>  asm1.minia_reads.41.txt
        @echo lib1.rev.fastq.gz  >>  asm1.minia_reads.41.txt
        /Users/adigenova/Git/wengan/bin/minia -in asm1.minia_reads.41.txt -kmer-size 41 -abundance-min 2 -out asm1.minia.41 -minimizer-size 10 -max-memory 5000 -nb-cores 20 2> asm1.minia.41.err > asm1.minia.41.log
        -rm -f asm1.minia.41.unitigs.fa.glue* asm1.minia.41.h5 asm1.minia.41.unitigs.fa

asm1.minia.81.contigs.fa : asm1.minia.41.contigs.fa
        @echo asm1.ccs.ec.fa  >  asm1.minia_reads.81.txt
#here we add the short-read data
        @echo lib1.fwd.fastq.gz  >>  asm1.minia_reads.81.txt
        @echo lib1.rev.fastq.gz  >>  asm1.minia_reads.81.txt
        @echo asm1.minia.41.contigs.fa  >>  asm1.minia_reads.81.txt
        @echo asm1.minia.41.contigs.fa  >>  asm1.minia_reads.81.txt
        @echo asm1.minia.41.contigs.fa  >>  asm1.minia_reads.81.txt
        /Users/adigenova/Git/wengan/bin/minia -in asm1.minia_reads.81.txt -kmer-size 81 -abundance-min 2 -out asm1.minia.81 -minimizer-size 10 -max-memory 5000 -nb-cores 20 2> asm1.minia.81.err > asm1.minia.81.log
        -rm -f asm1.minia.81.unitigs.fa.glue* asm1.minia.81.h5 asm1.minia.81.unitigs.fa

asm1.minia.121.contigs.fa : asm1.minia.81.contigs.fa
        @echo asm1.ccs.ec.fa  >  asm1.minia_reads.121.txt
#here we add the short-read data
        @echo lib1.fwd.fastq.gz  >>  asm1.minia_reads.121.txt
        @echo lib1.rev.fastq.gz  >>  asm1.minia_reads.121.txt
......

Then you can use the edited makefile to run the whole pipeline with:

make -f asm1.mk all

I have not tested this read-set combination, but the above method is one good alternative to include the short-read data into your assembly.

Best, Alex

dhoconno commented 3 years ago

Thanks! I'll give that a whirl!

markopetek commented 2 years ago

@dhoconno Please let us know if adding the Illumina reads improved your assembly. Thanks.