c-BIG / NPM-sample-qc

reference implementation of GA4GH WGS Quality Control Standards
https://c-big.github.io/NPM-sample-qc
MIT License
10 stars 3 forks source link

Desired behavior for mean_autosome_coverage in "GHIF WGS QC Metric definitions" googledoc unclear #43

Closed justinjj24 closed 1 year ago

justinjj24 commented 2 years ago

mean_autosome_coverage in "GHIF WGS QC Metric definitions" googledoc

Field Description Format Value in example implementation
REF Genomic reference build String GRCh38
BED Genomic filtering regions String Homo_sapiens_assembly38.autosomes.bed
MIN_BQ Minimum base quality Integer 0
MIN_MQ Minimum mapping quality Integer 20
DUP Are duplicates included? Boolean FALSE
CLP Are clipped bases (hard and soft clipped) included? Boolean FALSE
OLP Are overlapping bases included? Boolean FALSE
UMI Are Unique Molecular Identifiers used to collapse reads? Boolean FALSE
SEC Are secondary alignments included? Boolean FALSE

mean_autosome_coverage recommended to be computed by mosdepth seems to imply we are looking at the average (mean) sequencing depth (across all autosomes) However Are overlapping bases included? | FALSE seem to imply that the desired behavior is for overlapping bases to be excluded and therefore what is counted is the fraction of the genome having at least one read

@nicolas-bertin or @skanwal some clarification is needed here to alleviate ambiguity

justinjj24 commented 2 years ago

Note: overlapping bases are excluded. Mosdepth avoids double-counting coverage when the ends of a paired-end sequencing fragment have overlapping alignments.

justinjj24 commented 2 years ago

https://github.com/c-BIG/NPM-sample-qc/blob/aa21ba718d0612426246bf14edd3c59385c3441a/bin/metrics/metrics.py#L103 should probably read "Overlapping bases from the two ends of paired-end reads are only contend once" instaed of "Overlapping bases excluded". right ? @nicolas-bertin or @skanwal do you agree

justinjj24 commented 1 year ago

See in the latest metrics_definitions.md PR-#7

https://github.com/ga4gh/quality-control-wgs/blob/main/metrics_definitions/metrics_definitions.md