Closed kyle-messier closed 6 months ago
Perhaps this issue has very wide overlaps with #176 ?
I will look at the targets
package.
Yes- sorry- I forgot about that issue. At first glance, targets appears to be the package of choice, but we'll need to look into it more.
targets definitely seems like the way to go. tutorial written by the authors here. In short, it keeps track of dependencies in a pipeline, keeps track of what needs to be re-run when something changes, and makes nice visualization of the pipeline.
@sigmafelix @dzilber @eva0marques @dawranadeep @Sanisha003 @mitchellmanware @MAKassien
I played around with implementing targets on a dev branch. Implementation seems straight forward, although I feel our code base is not ready for it yet. We need to have function (i.e. targets) ready to connect in a pipeline. With that said, I think the pipeline can motivate us to further organize or code. Along the lines of our discussions to move the download_* functions in Rinput/ to R/, I think the approach is to create a suite of functions in R/ that describe the high-level steps in the analysis pipeline. Similar to the pipeline in the readme:
Then each of those has a sub-suite of functions that actually do the work. But I think that will make defining the targets for a reproducible pipeline more straight forward.
If there are any thoughts, please let us know!
@dzilber For more "complicated" targets the package does describe how to implement so-called dynamic targets using factories. I'm wondering if your familiarity with factory functions could help interpret this. And better yet, perhaps you could utilize this as an example for demonstrating factory functions in our group meetings.
@sigmafelix @mitchellmanware
Is the zzz.R file like the ones articulated here? It looks like it is used to set up a working directory, etc.? Seems good for now - we'll see if it becomes obsolete down the line with targets implementation. Thanks!
@Spatiotemporal-Exposures-and-Toxicology Yes, I was thinking of making a vignette for initial settings but ended up adding a .onLoad
call in zzz.R for guidance. I agree on removing this file as soon as the pipeline is completed and documented.
A very rudimentary example of a target configuration file:
Most of the file and function names are examples.
In the pipeline, an imputation function/part would be necessary. For various reasons, MODIS data have some missing days (a few days to a month during 2018-2022), which result in the calculated covariates from these having missing days. The days without the raw data are not contained in the covariate data.frames. Although we already developed a function to check any NAs exist in outputs, this function should be run before NA existence is checked.
Imputation method (simple [e.g., median/mean for the week/month/etc.]/linear/ML-based) needs to be discussed.
After #255 is completed, additional changes in DESCRIPTION
need to follow
Remotes:
entries in DESCRIPTION
for spun-off packages on GitHub@import
or @importFrom
lines in roxygen2 documentations in main functions that will remain in the pipeline package
targets seems to be the latest and greatest R package for pipeline development in R