c-BIG / NPM-sample-qc

reference implementation of GA4GH WGS Quality Control Standards
https://c-big.github.io/NPM-sample-qc
MIT License
8 stars 2 forks source link

ga4gh/quality-control-wgs metrics_definitions.md - Mean insert size #86

Closed justinjj24 closed 1 year ago

justinjj24 commented 1 year ago

Description: The mean insert size of short paired-end sequencing high quality, non duplicated, properly paired reads, primary alignments, achieving a mapping quality of 0 or greater on GRCh38 assembly.

Implementation details: In the NPM-sample-QC reference implementation it is computed using GATK Picard’s CollectInsertSizeMetrics, reporting the MEAN_INSERT_SIZE field. Only the non duplicated, properly paired reads mapped on GRCh38 assembly with MAPQ > 0 are considered.

Good to include the comment given below or remove the high quality from the description? The high quality reads also not part of the description in the proposed GHIF WGS QC document!

justinjj24 commented 1 year ago

state 'No minimum mapping-quality is imposed' or 'No filter on mapping qualiy' is applied, instead of MAPQ > 0 are considered?

A mapping quality of zero in bwa means that the read maps to multiple locations with the same quality and that the mapper has picked one of these positions at random.

This process uses non duplicate reads only.

mhebrard commented 1 year ago

Fix typo, clarify MQ criteria & exclude duplicate