Handling differences between target sequences and reference sequences

MaveDB allows users to specify accession numbers from major genomic databases (Ensembl, RefSeq, UniProt) when depositing a target sequence. As we develop a validation framework for these accession numbers, it will be important to handle cases where a target sequence is similar but not identical to the reference sequence.

There are many cases where this is useful. For example, one of the TP53 datasets in MaveDB was performed on a non-reference allele (see: https://mavedb.org/#/experiment-sets/urn:mavedb:00000068). To address this, the target was entered as "TP53 (P72R)" (e.g. for https://mavedb.org/#/score-sets/urn:mavedb:00000068-a-1).

If we wanted to associate this target with a transcript from RefSeq we could:

State that there is no match
Choose the closest transcript
Choose the closest transcript and document any differences between the target and the transcript

Of these, it seems that the last option is clearly the best one.

We should be able to do this in the API by adding associated VRS objects that describe the differences between the given reference sequence and the target sequence. From there we can build the necessary UI elements to convey this information to the user concisely.

VariantEffect / mavedb-api

Handling differences between target sequences and reference sequences #79