Open reece opened 8 years ago
Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):
This is doable, but hard. The major challenge is that there are a large number of coordinate types (simple, base-offset with seq start datum, base-offset with cds start datum, base-offset with cds end datum), position types (range, interval), and uncertainty. Coordinate types are associated withe variant type (c,g,m,n,r,p), and position types are associated with the edit type (del, ins, etc). Addressing this issue requires enumerating all combinations.
We should tackle this, but there are enough big changes in 0.5.0 currently, and this would be a fairly big change. Let's defer to a future release.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
These currently fail validation:
validate(hp.parse_hgvs_variant('NM_004260.3:n.2338insC'))
HGVSInvalidVariantError: insertion length must be 1
validate(hp.parse_hgvs_variant("NM_001637.3:c.1582_1583G>A"))
HGVSInvalidVariantError: NM_001637.3:c.1582_1583G>A: Variant reference (G) does not agree with reference sequence (GG)
Are we ok with validation being done in validate() not the parser? In which case we can just close this issue?
Or - if it needs to be fixed, what should we do?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
I can probably fix this if we can agree where (see bullet points above)
Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)
The parser decomposes PosEdits into Positions and Edits.
Positions are modeled as Intervals with start and end. Point positions are converted to Intervals with start==end. The current grammar therefore allows point positions and ranges to be accepted interchangeably when it should not.
Two specific consequences of this design are that the grammar accepts SNVs with a range and insertions with a point position. Examples (both of which are incorrect):
Insertions should always require a range (e.g., 2338_2339) and substitutions should always require a point position.