c-BIG / NPM-sample-qc

reference implementation of GA4GH WGS Quality Control Standards
https://c-big.github.io/NPM-sample-qc
MIT License
8 stars 2 forks source link

ga4gh/quality-control-wgs metrics_definitions.md - Genome coverage uniformity #87

Closed justinjj24 closed 1 year ago

justinjj24 commented 1 year ago

Description: The median absolute deviation of sequencing coverage derived from short paired-end sequencing high quality, non duplicated reads, primary alignments, achieving a mapping quality of 20 or greater, in autosomes non gap regions of GRCh38 assembly. Clipped bases are expluded. Overlapping bases are counted only once. It is critcal that the (BAM/CRAM) alignment files be readily marked for duplicated reads and clipped bases.

Implementation details: In the NPM-sample-QC reference implementation, the genome-wide sequencing coverage of non duplicated reads, non clipped bases, non overlapping bases, primary alignments, achieving a mapping quality of 20 or greater is derived from mosdepth v0.3.2. It is further narrowed down to the non gap regions of GRCh38 assembly, autosomes only using bedtools intersect. The median absolute deviation of the coverage is then calculated using datamash.

expluded -> excluded

Good to include the comment given below or remove the high quality from the description? The high quality reads also not part of the description in the proposed GHIF WGS QC document!

mhebrard commented 1 year ago

Fix typo Keep high quality