Closed b0d0nne11 closed 5 months ago
After discussing this internally we think this also applies similarly to insertions.
In [1]: var_c = parse('NM_004985.4:c.567_*1insCCC')
In [2]: var_p = c_to_p(var_c)
In [3]: str(var_p)
Out[3]: 'NP_004976.2:p.(Ter189Ter)'
We expect this to also return p.?
. I'll extend my PR to handle these cases.
I agree that the current responses for both examples are wrong. However, what it should be is less clear to me.
Can you please elaborate on your rationale for p.? in these cases?
@reece For the mutations affected by this pull request, the entire coding sequence is unchanged and the added material is within the 3' UTR.
c.39_*1insA
c.12_*1dup
Therefore, these are 3' UTR mutations. All other 3' UTR mutations get p.?, so these mutations should also get p.?
What is your source for the variant representation of NM_153223.3:c.2959_*1dup
? Did you call g_to_c previously?
If we try to represent the underlying genomic even that causes this variant and use the left-shuffled insertion representation, I believe we end up with NC_000005.10:g.123346517_123346518insATTA
. Performing g_to_c on this representation results in NM_153223.3:C.*1_*2insTAAT
and c_to_p then yields p.?
. So this issue is also related to ins->dup in hgvs conventions.
To be honest, personally I am not a big fan of this hgvs-dup "prioritization" rule. In my opinion this modifies the underlying nature of the genomic event and drastically changes the coordinates. We would be often better off without the representation as dup (for most small variants). Your variant is one of the examples why.
Btw, if I plug in right-shuffled coordinates for this variant I end up with p.(=)
. I am not sure which of the two hgvs_p is "better".
@andreasprlic
We are just attempting to follow the guidelines as they exist, which say that if you can represent something as a dup, it must be represented as a dup, and that nomenclature should be 3' shifted. The cdot nomenclature NM_153223.3:c.2959_*1dup
is correct HGVS nomenclature according to those rules, and the pull request fixes a bug where the pdot is assigned incorrectly.
They key point for the examples in the pull request is that the inserted material is inserted AFTER the stop codon, in the sense that the ribosome will make it all the way to the stop codon and not encounter any mutation. Therefore, in the pull request these variants are identified as being in the 3' UTR region, and then end up with p.?
like any other 3' UTR variant.
To answer your initial question, the cdot NM_153223.3:c.2959_*1dup
comes from calling g_to_c on NC_000005.9:g.122682212_122682215dup
, which is itself the left-shifted version of the correct gdot (NC_000005.9:g.122682216_122682219dup
), because the transcript is negative strand.
@reece I feel this example demonstrates a problem with the hgvs recommendation to represent insertions as duplications where appropriate. The dup changes the underlying nature (coordinates) of the event and as a consequence we have problems with the hgvs_p here. I believe you are involved into some of the future of hgvs discussions. Is the ins->dup recommendation something that could get more nuance? Perhaps on the chromosomal level insertions don't need to get changed to duplications, but this is only recommended for the protein level?
@andreasprlic @reece
I think we probably all agree that returning p.Met1?
is completely wrong for NM_153223.3:c.2959_*1dup
.
This pull request returns p.?
instead, which is the same thing returned for NM_153223.3:C.*1_*2insTAAT
which is what the cdot would be if hgvs guidelines were changed to eliminate dups.
Based on that, can this pull request be merged, and future changes to hgvs guidelines be dealt with separately?
@andreasprlic @reece
by the way, the pull request also fixes non-duplication insertions just after the stop codon. The second unit test added is
NM_004985.4:c.567_*1insCCC
--> p.?
@gostachowiak apologies for the slow response, yes definitely p.Met1?
is completely wrong. You refer to "this pull request" - do you mean #716 ? I took a look at the latest version of that and this looks much more concise now! As such I approved.
@andreasprlic yes I meant #716. Thanks!
Any duplications with an end position at or past the stop codon should be classified as 3'UTR regardless of start position. Currently mapping
NM_153223.3:c.2959_*1dup
yieldsNP_694955.2:p.(Met1?)
. We believe that should map toNP_694955.2:p.?
because all other variants in the UTR map top.?
.We expect this to result in
NP_694955.2:p.?
instead.