Closed danerkestey closed 6 months ago
This won't work - you've removed the call to provoc:::parse_mutation()
, so no actual parsing is done. You can see a few test cases in tests/parse_mutations.R
where something like ~27785A
gets parsed into aa:orf7b:Y10*
. The parse_mutation()
function is the workhorse and contains all of the bioinformatic information required to determine the amino acid position within an open reading frame.
parse_mutations()
should better handle the string operations, then pass them to parse_mutation()
. The actual parse_mutation()
function could be vectorized for speed improvements, but the amino acid information needs to remain.
Sorry, I believe it's my fault for not pointing you to the tests, and I now see that Baaijens does not have the mutations
column (I thought it did). For the next steps, look at the output of parse_mutations(Baaijens$label)
for the original function, and make sure the new version produces the same results.
Addressed comments
In this optimized version, I minimized string operations and redundant conversions for
parse_unique_mutations
:unique(muts)
outside loop to avoid redundant calculationsunique_muts
compared to the nested loopsstringsAsFactors = FALSE
to make sure char columns aren't converted to factors