Closed martinmodrak closed 11 months ago
Thanks @martinmodrak, this really improved the code.
There appears to be no change to the docs website or the readme (presumably because you are working on a separate JOSS branch ? But the README is identical there as well ... ) making it a bit hard for me to see and test the changes made... Could you update the docs or show me some examples on how the new interface looks like?
Hi @martinmodrak, Thanks for the comment. Much of the work on this issue centered around addressing the background bug you accurately identified, which occurred when selecting a single covariate in the study. Regarding the interface, we have eliminated the specific ordering of data and now gather it according to each column name. In forthcoming major updates, we plan to incorporate a formula for data handling. We updated the documentation. In the readme file example, the following input values have been added.
outcome_col = "Y",
treatment_col = "treat",
covariates_col = paste0("cf", seq(1,6)),
While the interface is usable as is, I think requiring a specific ordering and naming scheme of a data.frame is an anti pattern.
A simple remedy would be to add parameters for the names of response, treatment and confounder columns (and then immediately translate internally to the fixed format) - the defaults for those parameters could then respect the current format. A formula-style syntax would be even nicer (e.g. the README examples could be specified via
formula = Y ~ treat, confounders = ~ cf1 + cf2 + cf3 + cf4 + cf5 + cf6
Additionally, the input format is not very thoroughly checked and interpreted, I've encountered two issues - in both cases, I am modifying the GP example from README.md by inserting some code just below line 3 (
sim_data <- generate_synthetic_data(sample_size = 500, gps_spec = 1)
)Just a single covariate:
Using
results in
(presumably
drop = FALSE
is missing in some indexing statement)Renaming outcome
Using
Completes, but for some reason provides different results than the original as well as a version with just the new irrelevant covariate without the renaming (i.e. just having
sim_data$new_cf <- rnorm(500)
). It appears the code sometimes refer to the outcome columnY
by name and sometimes by index.