VariantEffect / mavehgvs

A specification and Python implementation for representing variants from Multiplexed Assays of Variant Effect.
BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

Support for multi-variants across multiple sequences #35

Open afrubin opened 1 year ago

afrubin commented 1 year ago

To support the definition of multiple target sequences for a single MaveDB score set, mavehgvs will need to support multi-variants across multiple target sequences.

The nomenclature for doing this is defined in HGVS:

I have a patient with hearing loss and variants in the GJB2 (c.35delG) and GJB6 (c.689_690insT) genes, how should I describe this? (Nancy Carson, Ottawa, Canada)

The recommendation is to use the format GJB2:c.[35delG] GJB6:c.[689_690insT]. This uses standard HGVS descriptions and prevents confusion regarding which variant was found in which gene. Note it is essential that you also define the coding DNA reference sequence used. Another format, coping with this directly, is to describe the variants as NM_004004.2:c.[35delG] NM_006783.1:c.[689_690insT], i.e. using the Genbank reference sequences in stead of the HGNC approved Gene Symbol.

Related tasks:

bencap commented 1 week ago

Note that right now although multi variants would be validated by our biocommons HGVS parser, their prefixes are not validated properly by our internal library code.