c-BIG / NPM-sample-qc

reference implementation of GA4GH WGS Quality Control Standards
https://c-big.github.io/NPM-sample-qc
MIT License
9 stars 2 forks source link

ga4gh/quality-control-wgs metrics_definitions.md - Percent reads properly paired #85

Closed justinjj24 closed 1 year ago

justinjj24 commented 1 year ago

Description: The percentage of short paired-end sequencing high quality, properly paired reads, primary alignments, achieving a mapping quality of 0 or greater on GRCh38 assembly. Duplicated reads are included. It is critcal that the (BAM/CRAM) alignment files be readily marked for duplicated reads.

Implementation details: In the NPM-sample-QC reference implementation it is computed using samtools stats, reporting the percentage of properly paired reads mapped on GRCh38 assembly with MAPQ > 0. Duplicated reads are incuded.

incuded -> included

1) Hence, the duplicated reads are included in the implementation does it necessary to mention the following comment in the description or change the code to remove duplicate?

2) Good to include the comment given below or remove the high quality from the description? The high quality reads also not part of the description in the proposed GHIF WGS QC document!

justinjj24 commented 1 year ago

state No minimum mapping-quality is imposed or No filter on mapping qualiy is applied, instead of MAPQ > 0 are considered?

A mapping quality of zero in bwa means that the read maps to multiple locations with the same quality and that the mapper has picked one of these positions at random.

This process includes duplicate reads.

mhebrard commented 1 year ago

similar to #84 Rephrase to clarify MQ criteria & include duplicated reads

  • Description: The percentage of short paired-end sequencing high quality, properly paired reads, primary alignments, mapped on GRCh38 assembly. Duplicated reads are included. No minimum mapping quality is imposed.
  • Implementation details: In the NPM-sample-QC reference implementation it is computed using samtools stats, reporting the percentage of properly paired reads mapped on GRCh38 assembly. Duplicated reads are included. No mapping qualiy is applied.