Open baozg opened 4 years ago
Hi Zhigui,
Thanks for trying purge_dups, very impressive results.
For your questions:
How to save the RNA mapping rate? Would you give me some advice? One thing you can try is polishing, use Arrow, Racon or some other tools. If this is not working, you could ask if it is normal, since the genes may appear in haplotigs, not in the primary contigs.
N can be filled by Arrow or Pilon? The answer would be Yes for some gaps. We have some scaffolds, where the haplotypic duplications were removed nicely, Arrow fills their gaps.
Any further questions, please feel free to ask.
Thanks for trying purge_dups.
Dengfeng.
Hi Dengfeng,
Thanks for promptly reponse.
I thick polish would save little gene back, since the Faclon Unzip version have the highests rate, so the gene could be in haplotigs. I will update the result when all polish done.
Arrow / Racon could filled some gaps in assembly. But I have another questions for polish.
Typically, polish should do serveral rounds , my Falcon Unzip assembly (polish by arrow*1) seems good enough for the haplotype remove. For ONT assembly,however, it have higher error rate, worse BUSCO when the raw assembly finished. We typically polish more rounds (racon*3 + medaka + pilon*2). Which step should I do the purge_dups
(racon & medaka based on ONT data, Pilon based on Illumina data)? Could it purged when the Canu finished, since purge_dups
only need raw ONT data and assembly itself.
After purging, the primary contig and haplotigs fasta should polish together or separately?
Combine the primary and haplotig, do minimap2 align and polish.
Mapping the raw data to primary and haplotig separately and polish separately.
purge haplotig in $hap_asm
readme
said:
Step 4. Merge hap.fa and $hap_asm and redo the above steps to get a decent haplotig set.
Just to be clear, so I need cat the hap.fa (purge from the primary contig of Unzip), then cat hap.fa and $hap_asm (cns_h_ctg.fa), then run the purge_dups
pipeline, get purged.fa
and hap.fa
. So the this round purged.fa
is all my decent haplotig?
Zhigui Bao
Hi Zhigui,
For your questions:
Which step should I do the purge_dups(racon & medaka based on ONT data, Pilon based on Illumina data)? Could it purged when the Canu finished, since purge_dups only need raw ONT data and assembly itself?
For the first question, kinda complex to me, I do not have a good answer, maybe run purge_dups after all steps are done. As for the second question, you could run purge_dups on canu assembly, just the parameters are optimized for falcon-unzip contigs, you may need to tune them. Tell me if it is not working well. And remember to change minimap2 option for ONT data.
After purging, the primary contig and haplotigs fasta should polish together or separately?
I think combine both file and polishing should be the right way, in this way, reads for primary contigs and their corresponding haplotigs can be assigned correctly to different loci. If you polish separately using all reads, the reads for primary contigs and corresponding haplotigs will be mapped to the same place, which will lead to a wrong polishing results. Is it clear?
Step 4. purged.fa is all my decent haplotig? Yes.
Cheers.
Dengfeng.
Hi, Dengfeng
purge_dups
is super easy to easy tool. Thanks for developing.I use the Falcon-Unzip to assemble a outbreed plant (270M genome size, 1.8% het ,based on GenomeScope), but the Unzip
cns_p_ctg.fasta
is 450Mb, which is 1.5x of my estimated genome size. After assembly, I try thepurge_dups
andpurge_haplotigs
to remove haplotigs. The result are shown below.Although
purge_haplotigs
have more contiguous genome, it seems have some unpurged haplotigs based on duplicated BUSCO(KAT plot also showed). I think thepurge_dups
assembly seems perfect based on KAT plot and BUSCO. So I have two question.purge_base.cov
the cutoffs areKAT plot
for three assemblyZhigui Bao