Open mbrush opened 5 years ago
Starting a list of requirements here based on the CQs and data examples - but fully aware this is naïve and incomplete. Hoping folks can help round this out (feel free to edit this list/comment directly)
Experimental provenance:
Quality metrics/flags:
"Flags for assessing whether the frequency was potentially affected by technical artifacts, such as low counts or a low-complexity genomic region"?
SNH comment suggested that there are quality scores specific to different sequencing technologies and platforms.
Questions/Considerations:
The evidence and provenance CQs here (rows 16-20 specifically) highlight the need to capture metadata on sequencing data quality, technology, and other aspects of variant detection relevant as provenance in population frequency studies. This ticket will be used to elicit input and feedback on this topic.
Our short term goal is to define a simpler model for a v0 release in the near future. Longer term we will want to coordinate with other GA4GH efforts, and drivers/stakeholders/partners like HL7-CG, to develop a richer, shared model of sequencing studies/metadata, covering broad use cases, which can be submitted to schema blocks and re-used by the broader community.
We need folks versed in this area to make recommendations about what is really needed and how we might structure it. Specifically, the types of technologies and protocols applied in population sequencing efforts, and specific metrics/scores used to assess sequencing data quality/reliability. Within our VA group, Irina is knowledgeable here, and Steven Hart has added some insightful comments as well. I also suspect that the HL7 Clinical Genomics group has thought about this issue, so Bob and/or one of his colleagues here could advise as well. Please note if there are others we might reach out to.