Closed lindaxiang closed 7 months ago
@lindaxiang Our team member @mhebrard pesented last month 24/10 GA4GH Quality Control of WGS Meeting which includes comparison of NPM and ARGO generated 1KG samples QC metrics (slide 6 and 7).
If you have chance kindly look at the correlation plots in the presentation share here (https://drive.google.com/drive/folders/1Q280zSQfqQRo1q70Amq9PEOLkGVA9z1O) and provide your valuable feedback. We would like to know the difference of the Argo pipeline computation (tools, parameters and data used such as only autosomes and filtered any blacklisted regions) for further discussion and see whether it could be functionally equivalent to NPM metrics!
NPM implementation details... https://c-big.github.io/NPM-sample-qc/metrics.html
Thank you @justinjj24 !
I looked into the results comparison between NPM and ARGO. The three metrics which have discrepancies are:
In ARGO pipeline, we retrieved these metrics from Picard/CollectWgsMetrics (version picard: 3.0.0). The following is an example command we have used:
CollectWgsMetrics
--INPUT NA20126.bam
--OUTPUT NA20126.CollectWgsMetrics.coverage_metrics
--INTERVALS autosomes_non_gap_regions.interval_list
--VALIDATION_STRINGENCY LENIENT
--REFERENCE_SEQUENCE GRCh38_hla_decoy_ebv.fa
--MINIMUM_MAPPING_QUALITY 20
--MINIMUM_BASE_QUALITY 20
--COVERAGE_CAP 250
--LOCUS_ACCUMULATION_CAP 100000
--STOP_AFTER -1
--INCLUDE_BQ_HISTOGRAM false
--COUNT_UNPAIRED false
--SAMPLE_SIZE 10000
--ALLELE_FRACTION 0.001
--ALLELE_FRACTION 0.005
--ALLELE_FRACTION 0.01
--ALLELE_FRACTION 0.02
--ALLELE_FRACTION 0.05
--ALLELE_FRACTION 0.1
--ALLELE_FRACTION 0.2
--ALLELE_FRACTION 0.3
--ALLELE_FRACTION 0.5
--USE_FAST_ALGORITHM false
--READ_LENGTH 150
--VERBOSITY INFO
--QUIET false
--COMPRESSION_LEVEL 5
--MAX_RECORDS_IN_RAM 500000
--CREATE_INDEX false
--CREATE_MD5_FILE false
--help false
--version false
--showHidden false
--USE_JDK_DEFLATER false
--USE_JDK_INFLATER false
Based on Picard/CollectWgsMetrics
script, the assessment was counting
non-duplicated
reads,Note, the bed file of “autosomes_non_gap_regions.interval_list” was downloaded from NPM-sample-qc
Please let me know what action items I shall do to facilitate the effort.
Implementation not aligned | NPM (mosdepth & datamash) | ARGO (picard-CollectWgsMetrics) |
---|---|---|
--COUNT_UNPAIRED | True | False |
--MINIMUM_BASE_QUALITY 20 | False | True |
--ALLELE_FRACTION[0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.3, 0.5] To calculate theoretical sensitivity (HET SNP). Not relevant for coverage calculcation. | False | True |
1000 base window-sizes. Coverage is computed on 1,000bp windows and averaged for the region of interest | True | False |
datamash - madraw (mad_autosomes_coverage) | True | False |
NPM implementation details... https://c-big.github.io/NPM-sample-qc/metrics.html
Picard-CollectWgsMetrics details... https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard- https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics
@lindaxiang - As we discussed, here a few action item on this PR
cross_contamination_rate
precision: NPM pipeline output the FREEMIX score, see metrics.py#L156You can find doc here and an example of dockstore.yml to include in your pipeline there You might also need some work to ensure the pipeline can run on nextflow tower (with default nextflow profile)
Please check if the PR fulfills these requirements
[ ] You have a descriptive and meaningful commit message.
[ ] Tests for the changes have been added (for bug fixes/features)
[ ] Docs have been added / updated (for bug fixes / features)
[ ] You have done your changes in a separate branch. Branches MUST have descriptive names that start with either the
fix/
orfeature/
prefixes. Good examples are:fix/signin-issue
orfeature/issue-templates
.[ ] You have commented the code, particularly in hard-to-understand areas
What kind of change does this PR introduce? (Bug fix, new feature, docs update, ...)
What is the current behavior? (You can also link to an open issue here)
What is the new behavior? (if this is a feature change)
How is it being tested (test Configuration)?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
Other information: