Closed dhoconno closed 3 years ago
Hi dhoconno,
Thanks! and glad that wengan has been useful for your genome.
Regarding your question, it might be possible to include the short-read data because the current version of the --ccsont pipeline, use a multi-kmer approach with the following k-mer sizes : 41,81,121,161,201,251,301,351 ; thus the short-reads can be used in the high-quality read assembly step up to k=121 for instance (or up to k=201 if your short-reads are 251).
At the moment this is not included in the pipeline and as you said the short-read are being ignored. A quick alternative is to edit the makefile that Wengan generate to control the assembly execution.
Let's said that the wengan command is the following:
perl wengan.pl -x ccsont -a M -s lib1.fwd.fastq.gz,lib1.rev.fastq.gz -l ont.fastq.gz -p asm1 -t 20 -g 3000 -n -b hifi.fastq.gz
# the -n is important because just create the makefile without executing it, is like a preview of the command that wengan will exec.
then, it will generate the makefile asm1.mk, that you can edit as follow:
.DELETE_ON_ERROR:
#Wengan automatic generated makefile
asm1.ccs.ec.fa :
zcat hifi.fastq.gz | /Users/adigenova/Git/wengan/bin/seqtk seq -l 60 -A -C - > asm1.ccs.ec.fa
asm1.minia.41.contigs.fa : asm1.ccs.ec.fa
@echo asm1.ccs.ec.fa > asm1.minia_reads.41.txt
#here we add the short-read data
@echo lib1.fwd.fastq.gz >> asm1.minia_reads.41.txt
@echo lib1.rev.fastq.gz >> asm1.minia_reads.41.txt
/Users/adigenova/Git/wengan/bin/minia -in asm1.minia_reads.41.txt -kmer-size 41 -abundance-min 2 -out asm1.minia.41 -minimizer-size 10 -max-memory 5000 -nb-cores 20 2> asm1.minia.41.err > asm1.minia.41.log
-rm -f asm1.minia.41.unitigs.fa.glue* asm1.minia.41.h5 asm1.minia.41.unitigs.fa
asm1.minia.81.contigs.fa : asm1.minia.41.contigs.fa
@echo asm1.ccs.ec.fa > asm1.minia_reads.81.txt
#here we add the short-read data
@echo lib1.fwd.fastq.gz >> asm1.minia_reads.81.txt
@echo lib1.rev.fastq.gz >> asm1.minia_reads.81.txt
@echo asm1.minia.41.contigs.fa >> asm1.minia_reads.81.txt
@echo asm1.minia.41.contigs.fa >> asm1.minia_reads.81.txt
@echo asm1.minia.41.contigs.fa >> asm1.minia_reads.81.txt
/Users/adigenova/Git/wengan/bin/minia -in asm1.minia_reads.81.txt -kmer-size 81 -abundance-min 2 -out asm1.minia.81 -minimizer-size 10 -max-memory 5000 -nb-cores 20 2> asm1.minia.81.err > asm1.minia.81.log
-rm -f asm1.minia.81.unitigs.fa.glue* asm1.minia.81.h5 asm1.minia.81.unitigs.fa
asm1.minia.121.contigs.fa : asm1.minia.81.contigs.fa
@echo asm1.ccs.ec.fa > asm1.minia_reads.121.txt
#here we add the short-read data
@echo lib1.fwd.fastq.gz >> asm1.minia_reads.121.txt
@echo lib1.rev.fastq.gz >> asm1.minia_reads.121.txt
......
Then you can use the edited makefile to run the whole pipeline with:
make -f asm1.mk all
I have not tested this read-set combination, but the above method is one good alternative to include the short-read data into your assembly.
Best, Alex
Thanks! I'll give that a whirl!
@dhoconno Please let us know if adding the Illumina reads improved your assembly. Thanks.
Hi there,
Congratulations on a fantastic, useful tool. I have high coverage ultralong ONT (~70x), Illumina WGS (30x), and PacBio HiFi (currently ~10x). When I run in ccsont mode with just the ONT and HiFi reads, I get pretty good results. Is there a way to utilize all three types of data to potentially improve the assemblies further, perhaps by reducing homopolymer indels a bit through use of the Illumina reads? When I try to pass all three types of data to wengan using the --ccsont preset, it seems like the Illumina reads are ignored.
Thanks!