clingen-data-model / allele

Documentation for data model of ClinGen
10 stars 2 forks source link

Across region allele #125

Closed ronakypatel closed 9 years ago

ronakypatel commented 9 years ago

If allele's either start OR end is in gene body, and counter coordinate, is in intron, how one should handle at simple allele level.

Example is here: NM_007294.3(BRCA1):c.5153-16_5156del20insAATA http://www.ncbi.nlm.nih.gov/clinvar/variation/141142/

srynobio commented 9 years ago

You would need to use intronOffsetEnd similar to this example.

larrybabb commented 9 years ago

I will defer to the others on the team with more domain knowledge regarding this specific use case (good question btw). For what it's worth, here's my understanding (needs to be validated).

Let's create an example to work with...

Assuming the exon referenced in the HGVS expression below starts at position 100. This example would indicate that the 5 nucleotides which start at 2 positions before the exon would be deleted along with the 3 nucleotides into the beginning of the exon. c.100-2delGCACC

                 99 |    (intron)   |100
                  v                   v
ref seq    AACGGCTA | atcta..ca g c | A C C TCATCT
alt seq    AACGGCTA | atcta..ca - - | - - - TCATCT
                                ^         ^
                             (100-2)    (102)

Our simpleAllele design does not currently explain how this type of deletion can be expressed since only one end of the reference location requires an intronOffset value and the other would fall within the NM_ reference sequence using and absolute position.

I think we will need to revisit our model design to find a solution, the result of which may potentially alter the current design. We will make this a priority.

larrybabb commented 9 years ago

@sharrison6 or @dazzariti. Would either of you guys look into any real world examples of variants/alleles that cross 'splice sites' and where HGVS c. notation is also employed.

Anyone else that may have actual examples like this, please bring them to the meeting tomorrow so that we can assess how these may be recorded in our current model or if we will need to consider a modification to support.

srynobio commented 9 years ago

If you review the coordinate page under spanning there is an example of how the offset is managed.

I've tried to create a JSON-LD version, I'll admit it's rough. And improvement/corrections are welcome.

{
  "@context":"http://clingen.org/models/SimpleAllele.jsonld",
  "@id": "http://clingendb.clingen.org/SimpleAllele/<id>",
  "@type":"SimpleAllele",
  "xref": 
  [
    "http://clingen/alleleregistry/<id>"
  ],
  "simpleAlleleType": "transcript",
  "allele": "AATA",
  "primaryNucleotideChangeType": 
  {
    "@type":"PrimaryNucleotideChangeType",
    "@id": "http://www.sequenceontology.org/browser/current_svn/term/SO:1000032",
    "display": "indel",
    "primary": "true"
  },
  "alleleName": 
  [
    {
      "@type":"AlleleName",
      "nameType": "hgvs-cdna",
      "name": "NM_007294.3(BRCA1):c.5153-16_5156del20insAATA",  
      "preferred": "true"
    }
  ],
  "referenceCoordinate": 
  {
    "@type":"ReferenceCoordinate",
    "referenceSequence":"../../ReferenceSequence/NM_007294.3        ",
    "start": "41215387",
    "end": "41215406",  
    "refAllele": "TCTATGATCTCTTTAGGGGT",
    "primaryTranscriptRegionType":
    {
      "@type":"primaryTranscriptRegionType",
      "@id":"http://www.sequenceontology.org/browser/current_svn/term/SO:0000162",
      "display":"splice_site",
      "primary":"true"
    },
    "ancillaryTranscriptRegionType":
    {
      "@type":"ancillaryTranscriptRegionType",
      "@id":"http://www.sequenceontology.org/browser/current_svn/term/SO:0001574", 
      "display":"splice_acceptor_variant"
    },
    "intronOffsetStart": "16",
    "intronOffsetDirection": "-"
  }
}
sharrison6 commented 9 years ago

Here are 2 examples (a deletion that extends into the 5' intron and deletion that extends into the 3' intron)

NM_003476.4(CSRP3):c.282-5_285delAACAGGTCC http://www.ncbi.nlm.nih.gov/clinvar/variation/44693/ This 9bp deletion deletes 5 intronic bases (c.282-5_282-1) and 4 exonic bases (c.282_285)

NM_133378.4(TTN):c.81493_81493+2delGGT http://www.ncbi.nlm.nih.gov/clinvar/variation/47486/ This 3 bp deletion deletes c.81493G, c.81493+1G, and c.81493+2T

In both cases we don't use a corresponding p. HGVS expression because the deletion removes the invariant region of the splice site sequence and thus is expected to disrupt splicing and lead to abnormal or absent protein.

larrybabb commented 9 years ago

Thanks Shawn. I totally missed this. You are correct. The spanning section on the coordinate page is the key and your example is spot on. And thanks to Steven for providing some additional examples.

Ronak - if you need any further assistance figuring out the representation of the two previous examples provided by Steven, please let us know.

bpow commented 9 years ago

According to hgvs, I think the appropriate p. expressions when the canonical splice site is effected could be:

Of course, if the protein or transcript is analyzed, then there could be a more specific p. variant given.