Open DBecker7 opened 6 months ago
The main provoc()
function should look the same to the user, but it actually just prepares data and dispatches to an aux function.
The machinery of process_optim()
should be changed to accommodate this. Instead of grouping by the by_col
, it should take in a fused data set and a method argument. I should keep the validate inputs inside process_optim()
.
This dispatching strategy should also handle the by_col
argument, and produce results with relevant headings (by_col
, method
, lineage_defs
, formula
, etc). Lineage defs should be named, or a function could try and guess their names (e.g. if it sees that the row names of one are nested in another, they would be base and base_provoc()
can use expand.grid(formula = formula, by = unique(data[, by_col]), lineage_def = names(lineage_defs))
to produce a data frame, then loop through the rows and send the relevant data to process_optim()
.
This is an opportunity to fix many inefficiencies. For instance, fuse()
gets called a couple times. This is unnessecary - it can be called once at the start, or it can be called individually for each run. It's probably best to run it individually for each run right now, since there may be several lineage definitions provided and we can just loop through and expand.grid()
object defined by the argument lists.
The main
provoc()
function would still work as intended, but would accept list arguments to many of the parameters. The computations would be moved to another function that is easy to call fromprovoc()
. All of the pre-computations would be the same for the lists of arguments, so this would be more efficient.Some ideas:
method = c("provoc", "freyja", "alcov")
would fit the same data/mutations with all three methods.lineages = list(c("B.1.1.7", "B.1.617.2"), c("B,1,1,7", "B.617.2", "AY.4")
would fit the model with each set of lineages (using the samemutation_defs
matrix).