DeclareDesign / DeclareDesign

DeclareDesign: Declare and Diagnose Research Designs
https://declaredesign.org
Other
99 stars 12 forks source link

dplyr handler refactor #447

Closed nfultz closed 3 years ago

nfultz commented 3 years ago

Moves all dot-mangling into rename_dots, with better logic for supporting dplyr style functions as step handlers.

Note: unfortunately, handler=subset now matches arguments correctly but still doesn't work because it uses base NSE rather than rlang quosures, so the predicate is evaluated in the declaration environment instead of on the provided data frame. There's no way of knowing if a UDF handler uses base or dplyr nse internally. This should probably be documented somewhere, perhaps a custom handlers vignette.

closes #444

graemeblair commented 3 years ago

This is awesome. It looks like this is all working. @nfultz is WIP just about documentation? I can merge in and add docs if so.

nfultz commented 3 years ago

Yes, just docs and testing - enough time has passed between 3.5 / 3.6 and now that I'm not entirely confident that this works as intended with the older versions of dplyr . The other part I'm not sure about is if anything breaks when tibbles get used in new places - I think there's tests for the data flow but not for estimates / estimands.

graemeblair commented 3 years ago

Seeing a number of errors in the tests. e.g.:

declare_population(N = 10, latent = seq(0, 1, length.out = N)) +
    declare_measurement(observed = as.numeric(cut(latent, breaks = seq(0, 1, length.out = 6), include.lowest = TRUE)))

Error: Error in step 2 ():
    Error in fabricate_mode_error(): You must do exactly one of: 
1) One or more level calls, with or without existing data 
2) Import existing data and add new variables 
3) Provide an `N` without importing data or creating levels 

Most seem to be related to using fabricatr either in declare_population or declare_measurement.

Others are about not throwing declare time errors for things like: declare_diagnosands(keep_defaults = FALSE)

I don't see these errors on master now.

nfultz commented 3 years ago

I've finished removing the remains of the strict-implicit-data-arg validation that was from before the great plus sign migration.

the official rules for handler argument matching are now something like:

If there are free variables on the handler, and .data/data is not named as a dot argument, and no dot's RHS is the data symbol, then:

add symbol(data) as an argument to the dots in first position. If the handler has a formal argument named .data/data then set the name of the new argument to that.

Then we record that if we did this on the object, rematch the dots to the handler and copy names over, such that all further argument matching is by name and not position.

When evaluating a step, if we had added an implicit data arg named data, but the step is invoked with data=NULL, remove it. This is what makes fabricate(N, ...) work correctly. This should naturally occur in the first step of a design, but steps may also be called directly. dplyr-as-handler should fail anyway when used in first position without an existing data set, and so the corresponding dplyr error message will be displayed.

graemeblair commented 3 years ago

amazing. thank you @nfultz. merging in now and prepping for CRAN.