HGVSnomenclature / hgvs-nomenclature

HGVS Nomenclature website
https://hgvs-nomenclature.org/
MIT License
4 stars 6 forks source link

coding/non-coding identity #169

Closed jtdendunnen closed 1 month ago

jtdendunnen commented 5 months ago

as discussed in HVNC meeting, there should be no difference in the description of this rule for coding and non-coding reference sequences

ifokkema commented 5 months ago

@jfjlaros I believe this was the sentence that you mentioned, correct?

jfjlaros commented 5 months ago

Indeed.

Additionally, instead of using "reference sequence", I would propose to use something like "transcript annotation", e.g., "it is not allowed to describe variants in nucleotides beyond the boundaries of the annotated transcript."

ifokkema commented 5 months ago

Additionally, instead of using "reference sequence", I would propose to use something like "transcript annotation", e.g., "it is not allowed to describe variants in nucleotides beyond the boundaries of the annotated transcript."

I believe, however, that this remark is meant to exclude NR_123456.1:n.100+10del. If so, the current statement is fine. Although it seems not to be allowed to describe NC_000001.10(NR_123456.1):n.-100del either, I do not believe that is what is meant here. Perhaps we should update the sentence and clarify what is meant here; intronic positions (that then require the addition of a genomic reference sequence) or anything beyond the transcript boundaries (even when used in combination with a genomic reference sequence). If both are meant, this should be clarified.

jfjlaros commented 5 months ago

Indeed, it should never be allowed to address a coordinate outside of the reference sequence, regardless of the coordinate system used. Perhaps we should make this a general remark.

If this was indeed the intention, then addressing genomic coordinates beyond the boundaries of the transcript was never a rule in the first place. If this is the case, then I would prefer a general remark and dropping the comments in the "coding" and "noncoding" sections.

jfjlaros commented 2 months ago

when would we use +/- syntax?

When we use a genomic reference sequence.

For example, description NG_012337.3(NM_003002.4):c.52+100del is valid because position c.52+100 indexes a G at position 5213 in the reference sequence of NG_012337.3. On the other hand, description NM_003002.4:c.52+100del is invalid since c.52+100 does not index anything in the reference sequence of NM_003002.4.