DASL-Lab / provoc

PROportions of Variants of Concern using counts, coverage, and a variant matrix.
https://dasl-lab.github.io/provoc/
MIT License
0 stars 0 forks source link

Update parse_mutations.R #19

Closed danerkestey closed 6 months ago

danerkestey commented 6 months ago

In this optimized version, I minimized string operations and redundant conversions for parse_unique_mutations:

DBecker7 commented 6 months ago

This won't work - you've removed the call to provoc:::parse_mutation(), so no actual parsing is done. You can see a few test cases in tests/parse_mutations.R where something like ~27785A gets parsed into aa:orf7b:Y10*. The parse_mutation() function is the workhorse and contains all of the bioinformatic information required to determine the amino acid position within an open reading frame.

parse_mutations() should better handle the string operations, then pass them to parse_mutation(). The actual parse_mutation() function could be vectorized for speed improvements, but the amino acid information needs to remain.

Sorry, I believe it's my fault for not pointing you to the tests, and I now see that Baaijens does not have the mutations column (I thought it did). For the next steps, look at the output of parse_mutations(Baaijens$label) for the original function, and make sure the new version produces the same results.

danerkestey commented 6 months ago

Addressed comments