JiaoLaboratory / CRAQ

Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement
https://doi.org/10.1038/s41467-023-42336-w
MIT License
54 stars 5 forks source link

Seeking Insights on Error Discrepancies in Combined ONT and HiFi Sequencing Data” #9

Closed Bank-tidy closed 9 months ago

Bank-tidy commented 9 months ago

Hello,

Following your suggestion, I merged the BAM files of ONT and HIFI data to generate an image. This process involved sequencing a plant with both second and third-generation techniques. The resulting assembly was performed using ONT and HIFI data. Notably, the third-generation data in the image exhibits fewer errors compared to the second-generation data, which shows more errors. This discrepancy raises questions about the reliability of error correction in the second-generation data. I am seeking your advice on how to interpret these results.

Thank you for your guidance!

image
JiaoLaboratory commented 9 months ago

hello, "the third-generation data in the image exhibits fewer errors compared to the second-generation data, which shows more errors." whic is not the case.

by default, CRAQ would report two stypes of assembly error CRE(like small local indels,which could be spanned by SMS reads) and CSE (like structral misjunction errors, which could not be spanned by SMS reads). Both CRE ans CSE are detected by combinning SMS and NGS data (not using NGS data to detecte CRE, or using SMS data to detected CSE ,separately). for your result, we can conclude the number of CREs over CSEs, that is normal.

for CRE (CRAQ cound not perform error correction, because such errors in fact cound generally corrected via multiple rounds of polish,like using polin, Racon, et al.) while, for CSE, which implies a structural mis-join that affects the overall assembly continuity, and CRAQ will separate the contig by interrupting the contig at CSE location for futher scaffolding by using hic or bionano maps.

from the circos, I can conclude only one CSE be found at chr9, (must be a high S-AQI score, the result means an high continuity for your assembly). While I can see also some CREs at each chromosome, means still some local indel-errors within your assembly. If you want futher quality improvement, maybe some rounds of polish useful. In fact, I donot know the exact counts of CREs, if the R-AQI score is over 95(reference huality), I think it is really good quality, which depends on you to polish or not.

Bank-tidy commented 9 months ago

Thank you so much for your valuable insights and guidance on our genome assembly analysis using CRAQ!