ga4gh / quality-control-wgs

Home for the GA4GH Quality Control of Whole Genome Sequencing metrics and reference implementations
https://www.ga4gh.org/product/wgs-quality-control-standards/
Apache License 2.0
14 stars 3 forks source link

add novel `cross_contamination_rate` metric #17

Closed justinjj24 closed 1 year ago

justinjj24 commented 1 year ago

It has been proposed and approved in-principle during GA4GH QC of WGS Standards - Follow up Tuesday Mar. 28 Meeting.

1000 Genome Project(1000g) dataset

Pre-calculated reference panel of 1000 Genome Project(1000g) dataset: UDPath: https://raw.githubusercontent.com/Griffan/VerifyBamID/master/resource/1000g.phase3.100k.b38.vcf.gz.dat.UD BedPath: https://raw.githubusercontent.com/Griffan/VerifyBamID/master/resource/1000g.phase3.100k.b38.vcf.gz.dat.bed MeanPath: https://raw.githubusercontent.com/Griffan/VerifyBamID/master/resource/1000g.phase3.100k.b38.vcf.gz.dat.mu

VerifyBamID2

https://github.com/Griffan/VerifyBamID

All Of Us QC Report

https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/2022/06/All%20Of%20Us%20Q2%202022%20Release%20Genomic%20Quality%20Report.pdf

mhebrard commented 1 year ago

Add cross contamination in #7

[edit]

https://github.com/ga4gh/quality-control-wgs/blob/e0b8d5a7b8bd233a783616d427782df4582cd021/metrics_definitions/metrics_definitions.md?plain=1#L82-L88

mhebrard commented 1 year ago

Can we confirm the status of

mhebrard commented 1 year ago

From https://github.com/Griffan/VerifyBamID/tree/7e7172c2b45f61c72d35d94708a7a5c2f271ee3a

--incl-flags [Int] Required flags: skip reads with mask bits unset [null] --excl-flags [Int] Filter flags: skip reads with mask bits set [UNMAP,SECONDARY,QCFAIL,DUP]

From https://github.com/Griffan/VerifyBamID/blob/master/main.cpp#L50-L65

  mplp_conf_t mplp;
  memset(&mplp, 0, sizeof(mplp_conf_t));
  mplp.min_mq = 2;
  mplp.min_baseQ = 13;
  mplp.capQ_thres = 40;
  mplp.max_depth = MPLP_MAX_DEPTH;
  mplp.max_indel_depth = MPLP_MAX_INDEL_DEPTH;
  mplp.openQ = 40;
  mplp.extQ = 20;
  mplp.tandemQ = 100;
  mplp.min_frac = 0.002;
  mplp.min_support = 1;
  bool noOrphan = false;
  mplp.flag = /*MPLP_NO_ORPHAN |*/ MPLP_REALN | MPLP_SMART_OVERLAPS;
  mplp.rflag_filter = BAM_FUNMAP | BAM_FSECONDARY | BAM_FQCFAIL | BAM_FDUP;
  mplp.output_fname = NULL;

Therefore confirming that VerifyBamID2 by default exclude duplicates, secondary alignments, min MQ = 2, min BQ = 13

We should follow best practice in the definition, request to exclude duplicates (and secondary alignment). We leave the mapping quality out of the formal definition - & specify the default param of the ref implementation

mhebrard commented 1 year ago

Update the definition to specify remove duplicates