Closed larrybabb closed 7 months ago
It looks like maybe it is because the original sequence NC_000016.9:g.89306725_89339913
is bounded on the left and right by a T
.
Original sequence length: 33188
Normalized sequence length: 33190
VRS repeatSubunitLength: 33189
original_sequence[:20]='TCGAGACCAGCCTGGCCAAC'
normalized_sequence[:20]='TTCGAGACCAGCCTGGCCAA'
original_sequence[-20:]='AGGTCAAGAGATCGAGACCA'
normalized_sequence[-20:]='GGTCAAGAGATCGAGACCAT'
(from notebook showing T
before and after the reference sequence:
https://gist.github.com/theferrit32/c9347dd7a5db88b986055bfde7f434c4)
So the result is correct (I think). But maybe we want to clarify or change how we handle large deletions. Would state.sequence=""
and state.length=0
also be an accurate reflection of the variant state, and be more clear? And not roll left/right?
The current behavior looks correct to me. And the normalized form needs to be state.sequence="T"
and state.length=1
because both NC_000016.9:g.89306725_89339913del
and NC_000016.9:g.89306724_89339912del
would normalize to the same location and state.
@ahwagner I can't get my head around this one. I believe this is a bug. please verify.
When I run the following hgvs expression through vrs-python's
translate_from()
...I get a RLE state with a length of
1
, a repeatSubunitLength of33189
and a sequence ofT
. In the following code block I'm showing the actual sequence surrounding the positions being deleted. With the before and after bases being shown in this example, I'm fairly confident that the entire 33,189bp span of sequence is repeating. So, I believe we have an error somewhere in this logic.