DOI-USGS / ds-pipelines-targets-example-wqp

An example targets pipeline for pulling data from the Water Quality Portal (WQP)
Other
10 stars 14 forks source link

Simplify `p3_wqp_param_cleaning_info` #79

Closed lekoenig closed 2 years ago

lekoenig commented 2 years ago

☝️ I think there may be interest in this kind of pattern in the future, so it is good to know what options exist within targets for it.

I'm thinking that because targets is R native (yay!), when the script runs

source("3_harmonize/src/clean_conductivity_data.R")
source("3_harmonize/src/clean_temperature_data.R")

(at the top of your 3_harmonize)

you add objects of class "function" to the environment that can be accessed. Targets will track those changes appropriately if they are referenced as objects, like you did in your commit to change to a list that had the objects instead of strings for the function names.

Since this pattern seems well suited for re-use, the one picky thing I'd point out is that the coding is currently redundant (e.g., you have to name the function twice with list(clean_conductivity_data = clean_conductivity_data)). I think you need the function name, which is why the current code does this. But perhaps you can instead use a vector of function objects (reverting to closer to what you had, but not turning them into strings) and then you could use substitute to extract the function names from the objects, like this:

tar_target(
    p3_wqp_param_cleaning_info,
    tibble(
        parameter_grp_name = c('conductivity', 'temperature'),
        cleaning_fxn = c(clean_conductivity_data, clean_temperature_data)
    )
),

# then in `fxn_to_use()`, you do something like

fxn_to_use <- p3_wqp_param_cleaning_info %>%
        filter(parameter_grp_name == unique(p3_wqp_data_aoi_clean_grp$parameter)) %>%
        pull(cleaning_fxn) # skip `names()` call

# ... within the `if {}`
do.call(as.character(substitute(fxn_to_use)), list(wqp_data = p3_wqp_data_aoi_clean_grp))       

But do.call() also accepts a function object instead of a string so I actually think your current could would work with that change to what fxn_to_use is(?)

do.call(fxn_to_use, list(wqp_data = p3_wqp_data_aoi_clean_grp)) if fxn_to_use is either the string for the function or the function object itself...?

since

do.call(mean, args = list(x = c(1,2,4,5,6,3,4)))
#and 
do.call("mean", args = list(x = c(1,2,4,5,6,3,4)))

do the same thing.

_Originally posted by @jread-usgs in https://github.com/USGS-R/ds-pipelines-targets-example-wqp/pull/75#discussion_r931148385_