Designating a transcript for an interpretation

wrightmw commented 3 years ago

On the VCI WG call today (8/13/20), the issue of transcript selection was raised. Currently, we use CA IDs and/or ClinVar IDs to designate a variant to start an interpretation in the VCI. There can be multiple transcripts associated with a CA/ClinVar ID, and so it would be great to add the ability for curators to specify a transcript before they publish an interpretation from the VCI (to the ERepo and/or for a ClinVar submission). We need to discuss at one point in the curation process this selection should be made.

larrybabb commented 3 years ago

Thanks for getting this into amore official request @wrightmw . As we noted using a CAID or ClinVar VariationID does not precisely describe what "variant" is being curated. I recognize we all agree that the CAIDs and VaritionIDs are a critical tool to grouping variants that are different forms of the underlying genomic change lifted over in various builds and projected onto all possible transcripts and then onto their predicted protein variants. These variation sets have an inherent meaning which the GA4GH GKS efforts are trying to help clarify and provide standard policies about. Hopefully those will provide a path forward that will help contribute to a solution for how CAIDs and VariaitionIDs can be effectively used in computational data sharing.

As far as this task goes, we really simply need to modify the VCI so that after a user picks a CAID or VariationID, they are presented with the list of variants in the set to select the best representative form that is being curated. While selecting the transcript will solve the problem for all variants that have at least one transcript, it will not solve the issue when it is not aligned with a transcript. I would think we could default to the mane select, but this should be modifiable at any time as long as the interpretation stays within a form of the variant that is in the original VariationID or CAID set. We wouldn't necessarily want them to change CAIDs or VariationIDs once they started an interpretation - since these provide the genomic grounding of the variant that all evidence and data is based on.

I would like to suggest that we require the user to select ANY ONE of the "contextual" forms of the variants provided by the CAID and/or VariationID sets as the variant that best represents the Variant Pathogenicity final assertion that is being made. This would address not only dealing with situations where there are overlapping genes for a particular genomic variant, but also provide a specific form for non-coding variants. In the end, there are a number of scenarios that require a precise contextual form as the "primary" or "representative" form of the variant for the curation that is occurring. For example, if we only had a CAID or VariationID for the primary form associated to a variant pathogenicity assertion, then it is not crystal clear what the original author of the statement would choose as a representative form when submitting the statement to ClinVar. There are other downstream uses of the shared variant pathogenicity statements that require the notion of a primary or representative contextual variant (i.e. for display, for context of gene/transcript, for submitting to clinvar, for associating between CAID sets and VariationID sets or CIVIC variation sets, etc...).

Let me know if you'd like me to clarify any of these design requirements. They are something I've been wrestling with as a user of the computational form of VCI approved interps both for sending to the erepo and for submitting to clinvar. Plus, this would be a great time to align with the direction that the GA4GH is moving in.

cgpreston commented 3 years ago

Use case from Erin 1.20.21:

3 variants curated in the gene CDKL5 are displaying with variant names that indicate that they are in another gene, RS1. The ClinVar IDs link to the correct (intended) variants, but the display name is incorrect. The evidence entered corresponds to what is expected to be present for the intended variant.

They are: NM_000330.4(RS1):c.185-3188G>A (https://curation.clinicalgenome.org/variant-central/56319f0b-8a37-44c9-9ad3-7ca50bd4e6e6/interpretation/d1466290-ae92-4343-88ba-88e98b897c23)
This variant is SUPPOSED to be NM_003159.2(CDKL5):c.2908C>T (p.Arg970Ter) (https://www.ncbi.nlm.nih.gov/clinvar/variation/143812/) NM_000330.4(RS1):c.185-3176C>T (https://curation.clinicalgenome.org/variant-central/5e14e632-8e61-4dd5-b0e1-9e663317729c/interpretation/6358edd6-1663-4d8a-9784-64f135a14164) This variant is SUPPOSED to be NM_003159.2(CDKL5):c.2896G>A (p.Val966Ile) (https://www.ncbi.nlm.nih.gov/clinvar/variation/210646/) NM_000330.4(RS1):c.185-3207G>A (https://curation.clinicalgenome.org/variant-central/0ba14c02-920e-40ee-a2cb-f1d1db9cc3cc/interpretation/68b2dec4-9be9-413d-8245-733323b08447) This variant is SUPPOSED to be NM_003159.2(CDKL5):c.2927C>T (p.Pro976Leu) (https://www.ncbi.nlm.nih.gov/clinvar/variation/156694/)

cgpreston commented 2 years ago

Another use case is a desire to use a minor transcript as identified by the Monogenic Diabetes VCEP. Discussed on Tools call 8/5/21. General PI recommendation is to continue collecting use cases and soliciting feedback on them. Users will be encouraged to enter the transcript selection (if different from MANE) in the evidence summary, and to contact MANE to update their transcript.

ClinGen / clincoded

Designating a transcript for an interpretation #2234