JiaoLaboratory / CRAQ

Identification of errors in draft genome assemblies with single-base pair resolution for quality assessment and improvement
https://doi.org/10.1038/s41467-023-42336-w
MIT License
53 stars 5 forks source link

Default value about default "sms_coverage" and "ngs_coverage" #2

Closed CJ-Chen closed 9 months ago

CJ-Chen commented 9 months ago
    --sms_coverage|-avgl            Average SMS coverage. Default: 100
    --ngs_coverage|-avgs            Average NGS coverage. Default: 100

Great job, thank you to the development team. I am using this software in my project. I would like to ask if "coverage" here refers to Depth? If so, is the default 100X requirement considered high? Would it be more suitable to lower it to, for example, 30?

JiaoLaboratory commented 9 months ago

Hi CJ-Chen, Thank you for your support of CRAQ. Yes, this parameter represents an estimate of the average sequencing depth. Its idea is primarily used to validate certain regions with extreme high coverage (>5*average depth). Various factors can cause such extreme high coverage, i.e. when our genome is incompletely assembled, those not utilized reads could be forcibly mapped to some similar regions, resulting in a very high coverage of clipping signals(false positive errors). Based on our experience, this parameter may have a slight impact on the final results. But It's advisable for users to perform a rough estimate about their sequencing depth in advance. If users are uncertain, we recommend this parameter could be a bit higher, because for some small microbial genomes (which usually exhibit high sequencing depth), a setting of 30x might be low.

CJ-Chen commented 9 months ago

Thank you for your prompt response.