Open mbrush opened 4 years ago
We don’t necessarily have to limit the content of the VariationMetadata object to information for which there is no dedicated statement type. It could provide an efficient/concise way of bundling supporting information of any kind without the overhead of representing a statement object for each piece of information. For example, the affected gene, molecular consequence, functional impact, population frequency, or pathogenicity interpretation of the subject variation. But we initially propose the split between simpler/foundational VariationMetadata, and Supporting Statements for more nuanced/complex information because:
point 3 - semantics - but can you change SNP to SNV, since SNP is really used in so many ways and provides misconceptions about frequency and or pathogenicity...
i am not sure what you mean by ancestral allele - is this meaning in the context of an alignment across species. otherwise i would assume it is the reference allele? need to know the definition or are you trying to capture the instances were the reference transcript happens to include a "rarer" allele
point 7 - i assume you mean predicted changes on the basis of the genetic sequence. how are you going to deal with variants that have multiple effects? we have seen coding variants that lead to leaky splicing and also to a missense change that alters protein function - and both these effects were predicted bioinformatically
I wonder about whether the 9 points of the object proposal are non-redundant, but perhaps the intention is to have deliberate redundancy to allow for "sanity-checking".
Point 1 defines a "label" for a variant and HGVS is given as an example. I presume that "HGVS" means a complete and valid HGVS-complaint variant description. If so, that description will be sufficient to innately define the most of the "structural types" mentioned in point 3. Similarly, the "reference allele" in point 4 will have been defined by the reference sequence and the sequence alteration that are necessary parts of an HGVS-compliant variant description.
The one thing that I think might be essential, but is missing at present, is the genome build for the "reference allele" in point 4. There are perfectly valid variant labels (e.g. 17-50198002-C-A or chr17:50198002C>A) which might refer to GRCh37 or GRCh38.
Proposing a new object/structure that would offer a concise structure to present basic variant information that often accompanies variant annotations, but for which there is no dedicated Statement type. For example:
Structurally, this VarintLevelMetadata (aka "Variant Details", as ClinVar calls an analogous object, or perhaps "VariantAttributes") object would be packaged in a VariantAnnotation object alongside the primary Statement (rather than within it, to maintain the atomic character of Statements).