DASL-Lab / provoc

PROportions of Variants of Concern using counts, coverage, and a variant matrix.
https://dasl-lab.github.io/provoc/
MIT License
0 stars 0 forks source link

Superfunction to run different model specs #42

Open DBecker7 opened 6 months ago

DBecker7 commented 6 months ago

The main provoc() function would still work as intended, but would accept list arguments to many of the parameters. The computations would be moved to another function that is easy to call from provoc(). All of the pre-computations would be the same for the lists of arguments, so this would be more efficient.

Some ideas:

DBecker7 commented 5 months ago

The main provoc() function should look the same to the user, but it actually just prepares data and dispatches to an aux function.

The machinery of process_optim() should be changed to accommodate this. Instead of grouping by the by_col, it should take in a fused data set and a method argument. I should keep the validate inputs inside process_optim().

This dispatching strategy should also handle the by_col argument, and produce results with relevant headings (by_col, method, lineage_defs, formula, etc). Lineage defs should be named, or a function could try and guess their names (e.g. if it sees that the row names of one are nested in another, they would be base and base_. This way, the wrapper function provoc() can use expand.grid(formula = formula, by = unique(data[, by_col]), lineage_def = names(lineage_defs)) to produce a data frame, then loop through the rows and send the relevant data to process_optim().

DBecker7 commented 5 months ago

This is an opportunity to fix many inefficiencies. For instance, fuse() gets called a couple times. This is unnessecary - it can be called once at the start, or it can be called individually for each run. It's probably best to run it individually for each run right now, since there may be several lineage definitions provided and we can just loop through and expand.grid() object defined by the argument lists.