JiaoLaboratory / CRAQ

Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement
https://doi.org/10.1038/s41467-023-42336-w
MIT License
53 stars 5 forks source link

interpretation of results #6

Open Wanjie-Feng opened 7 months ago

Wanjie-Feng commented 7 months ago

Hello, thanks for developing a very useful tool, I have a little doubt when I use this tool, I would like to ask you : My command is as follows :

perl  ./CRAQ/bin/craq -g chr.fasta -sms hifi.sorted.bam -pl T 

When I go to check the results after normal operation, I mainly focus on two files:out_final.CRE.bed and out_final.CSE.bed The result of out_final.CRE.bed is as follows:

m01 1   1   m01:1   CRE
m01 60253887    60253887    m01:60253887    CRE
m02 1   1   m02:1   CRE
m02 44039054    44039054    m02:44039054    CRE
m02 54796608    54796608    m02:54796608    CRE
m03 1   1   m03:1   CRE
m03 11057839    11057839    m03:11057839    CRE
m03 49202667    49202667    m03:49202667    CRE
m04 1   1   Gm04:1  CRE
m04 36798166    36798166    m04:36798166    CRE
m04 55614844    55614844    m04:55614844    CRE

The result of out_final.CSE.bed is as follows:

m08 15154   15155   m08:15154   CSE
m11 27340675    27340676    m11:27340675    CSE
m17 22122332    22122333    m17:22122332    CSE
m17 24023509    24023510    m17:24023509    CSE
m17 24032958    24032959    m17:24032958    CSE

I don 't understand why the area length of my bed file is 1 Hope to get help, thank you

JiaoLaboratory commented 7 months ago

HI, Thank you very much for using CRAQ. In fact, CRAQ uses sequencing sequences to find clipping breakpoints. In the case of the “”--ser T” option, CRAQ will search the region near this breakpoint to see if there is also a cluster of low-quality base SNPs. When the length of the bed file is 1, it indicates that this position is an exact breakpoint. As follows:

image

Wanjie-Feng commented 7 months ago

The -ser parameter is T by default, but I don 't see the corresponding information in the result. In fact, my interpretation of the result is not very clear. Can you add more information about the interpretation of the result files
Thank you

JiaoLaboratory commented 7 months ago

The result of out_final.CSE.bed and out_final.CRE.bed could descriped as: chrid error_start error_end error_breakpoint error_stype m17 22122332 22122333 m17:22122332 CSE

for your result, error_start = error_end, it is normal, means there is no mapping gap near breakpoint (above figure left ).

JiaoLaboratory commented 7 months ago

image

Wanjie-Feng commented 7 months ago

Thank you for your reply and explanation, best wishes