CountESS-Project / CountESS

CountESS
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

More flexible Variant Calling #23

Closed nickzoic closed 7 months ago

nickzoic commented 1 year ago

At the moment, the Variant Translator plugin takes a reference sequence and does a reasonable job of turning sequences into "g."-type HGVS strings, but there's no support for protein variants or for grouping variations into triplets (eg: for coding sequences)

We could also do protein variant calling by translating before and after sequences to IUPAC single-letter protein sequences, then calling Levenshtein, then translating the single-letter codes to the IUPAC three letter codes used in HGVS.

It's likely that misalignment due to single base inserts or deletes will cause big changes at the protein level and get thrown out by the max_mutations limit.

nickzoic commented 1 year ago

Actually looking at http://varnomen.hgvs.org/recommendations/protein/variant/substitution/ the syntax is a bit different, eg:

This might be a lot harder than I initially thought.

nickzoic commented 1 year ago

I've made a start on this work at https://github.com/nickzoic/countess-variants

nickzoic commented 7 months ago

This got more-or-less-fixed-for-now in 51435699358536edb1d55f6b73f9ca1cc85581e5 (merged in v0.0.44) although a more sophisticated approach would be welcomed as a plugin.