JiaoLaboratory / CRAQ

Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement
https://doi.org/10.1038/s41467-023-42336-w
MIT License
53 stars 5 forks source link

Can CRAQ be used as software for detecting haplotype assembly errors? Is it capable of disassembling erroneous assemblies? #12

Closed yt619 closed 6 months ago

yt619 commented 7 months ago

For the assembly of homologous polyploids, phasing presents the greatest challenge. Is it possible to identify phasing assemblies using ONT data and to disassemble erroneous assemblies? We hope you can assist us in resolving such issues, as haplotype resolution in polyploids is critically important. The accuracy of ONT reads appears to be only around Q18, so can we avoid such erroneous identifications at the parameter level?

Thank you very much for your attention to this matter.

Sincerely, Tuo Yang

JiaoLaboratory commented 7 months ago

Sorry for my busy; This is a good question, and currently, CRAQ does not involve haplotype switch evaluation. I have been pondering this question for a long time, but unfortunately, I still cannot provide a definitive answer. However, some outputs from CRAQ may offer some reference.

For diploid organisms, CRAQ will report the true biological differences between two haplotypes as CRH/CSH, and assembly errors as CRE/CSE. Users can input haplotype 1 and haplotype 2 separately to detect assembly errors (following: CRAQ -g hap1.fa -sms sms.fq -ngs NGS.R1.fq,NGS.R2.fq & CRAQ -g hap2.fa -sms sms.fq -ngs NGS.R1.fq,NGS.R2.fq).

However, the evaluation of haplotype phasing also involves the detection of haplotype switches (a region was assembled into hap1, but actually a part of hap2), the feature currently lacking in CRAQ. The low accuracy of ONT sequencing makes me more frustrated. I believe the current best approach about phasing evaluation would really rely on the availability of two parental genomes as references.