biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
240 stars 94 forks source link

Modify parser to include common errors to provide better error messages than "syntax error" #720

Open davmlaw opened 8 months ago

davmlaw commented 8 months ago

The current design uses Parsley to define the correct grammar only, which leads to a "syntax error" on some common mistakes.

If we modified the grammar parser to include common mistakes, then manually checked for them, we could give much more informative messages than "syntax error".

For instance, consider an insertion:

NC_000017.11:g.43091687_43091688insGATTACA

Here are some common mistakes I have seen in the wild:

Integer length for insertion

Example: NC_000017.11:g.43091687_43091688ins7 Suggested error: "Insertions require inserted sequence, rather than an integer length"

Missing insertion

Example: NC_000017.11:g.43091687_43091688ins Suggested error: "Insertions require inserted sequence"

davmlaw commented 8 months ago

I am happy to do the work and make a pull request, just raising the issue for discussion

davmlaw commented 7 months ago

Potentually related to #367

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.