Closed ronakypatel closed 9 years ago
I will defer to the others on the team with more domain knowledge regarding this specific use case (good question btw). For what it's worth, here's my understanding (needs to be validated).
Let's create an example to work with...
Assuming the exon referenced in the HGVS expression below starts at position 100. This example would indicate that the 5 nucleotides which start at 2 positions before the exon would be deleted along with the 3 nucleotides into the beginning of the exon. c.100-2delGCACC
99 | (intron) |100
v v
ref seq AACGGCTA | atcta..ca g c | A C C TCATCT
alt seq AACGGCTA | atcta..ca - - | - - - TCATCT
^ ^
(100-2) (102)
Our simpleAllele design does not currently explain how this type of deletion can be expressed since only one end of the reference location requires an intronOffset value and the other would fall within the NM_ reference sequence using and absolute position.
I think we will need to revisit our model design to find a solution, the result of which may potentially alter the current design. We will make this a priority.
@sharrison6 or @dazzariti. Would either of you guys look into any real world examples of variants/alleles that cross 'splice sites' and where HGVS c. notation is also employed.
Anyone else that may have actual examples like this, please bring them to the meeting tomorrow so that we can assess how these may be recorded in our current model or if we will need to consider a modification to support.
If you review the coordinate page under spanning there is an example of how the offset is managed.
I've tried to create a JSON-LD version, I'll admit it's rough. And improvement/corrections are welcome.
{
"@context":"http://clingen.org/models/SimpleAllele.jsonld",
"@id": "http://clingendb.clingen.org/SimpleAllele/<id>",
"@type":"SimpleAllele",
"xref":
[
"http://clingen/alleleregistry/<id>"
],
"simpleAlleleType": "transcript",
"allele": "AATA",
"primaryNucleotideChangeType":
{
"@type":"PrimaryNucleotideChangeType",
"@id": "http://www.sequenceontology.org/browser/current_svn/term/SO:1000032",
"display": "indel",
"primary": "true"
},
"alleleName":
[
{
"@type":"AlleleName",
"nameType": "hgvs-cdna",
"name": "NM_007294.3(BRCA1):c.5153-16_5156del20insAATA",
"preferred": "true"
}
],
"referenceCoordinate":
{
"@type":"ReferenceCoordinate",
"referenceSequence":"../../ReferenceSequence/NM_007294.3 ",
"start": "41215387",
"end": "41215406",
"refAllele": "TCTATGATCTCTTTAGGGGT",
"primaryTranscriptRegionType":
{
"@type":"primaryTranscriptRegionType",
"@id":"http://www.sequenceontology.org/browser/current_svn/term/SO:0000162",
"display":"splice_site",
"primary":"true"
},
"ancillaryTranscriptRegionType":
{
"@type":"ancillaryTranscriptRegionType",
"@id":"http://www.sequenceontology.org/browser/current_svn/term/SO:0001574",
"display":"splice_acceptor_variant"
},
"intronOffsetStart": "16",
"intronOffsetDirection": "-"
}
}
Here are 2 examples (a deletion that extends into the 5' intron and deletion that extends into the 3' intron)
NM_003476.4(CSRP3):c.282-5_285delAACAGGTCC http://www.ncbi.nlm.nih.gov/clinvar/variation/44693/ This 9bp deletion deletes 5 intronic bases (c.282-5_282-1) and 4 exonic bases (c.282_285)
NM_133378.4(TTN):c.81493_81493+2delGGT http://www.ncbi.nlm.nih.gov/clinvar/variation/47486/ This 3 bp deletion deletes c.81493G, c.81493+1G, and c.81493+2T
In both cases we don't use a corresponding p. HGVS expression because the deletion removes the invariant region of the splice site sequence and thus is expected to disrupt splicing and lead to abnormal or absent protein.
Thanks Shawn. I totally missed this. You are correct. The spanning section on the coordinate page is the key and your example is spot on. And thanks to Steven for providing some additional examples.
Ronak - if you need any further assistance figuring out the representation of the two previous examples provided by Steven, please let us know.
According to hgvs, I think the appropriate p. expressions when the canonical splice site is effected could be:
p.?
- protein has not been analysed, an effect is expected but difficult to predictp.0?
- probably no protein is producedOf course, if the protein or transcript is analyzed, then there could be a more specific p. variant given.
If allele's either start OR end is in gene body, and counter coordinate, is in intron, how one should handle at simple allele level.
Example is here: NM_007294.3(BRCA1):c.5153-16_5156del20insAATA http://www.ncbi.nlm.nih.gov/clinvar/variation/141142/