Open cole-brokamp opened 2 years ago
degauss_run
works well with {targets} but only caches the target objects and re-runs any function that depends on a target that changed upstream... so doesn't help with the s3_downloads folder problem (e.g., address file -> geocoded -> narr pipeline, if one address changes, the entire pipeline would re-run and narr s3 chunks would not be cached due to temp folder associated with degauss_run
)
Based on our discussion today, it sounds like the best path forward will be to not spend time/energy on fixing this now with {targets} or anything more advanced.
Instead, add an example in the vignette that shows how to use https://memoise.r-lib.org/ to speed up repeat calculations. Something like:
fc <- memoise::cache_filesystem(fs::path(fs::path_wd(), "data-raw"))
degauss_run <- memoise::memoise(degauss_run, omit_args = c("quiet"), cache = fc)
Make sure to add a note explaining how/why caching is not expected to work normally with DeGAUSS when using DeGAUSS from R with dht::degauss_run
fs::path_abs("s3_downloads")
).s3_downloads
In container,
options(s3.download_folder = "/s3_downloads")
and on call to container, find working directory and bind mount to /s3_downloads
glue::glue("-v {getwd()}:/s3_downloads}")
create intermediate csv files, caches, and run docker from tools::R_user_dir("dht", "cache")
instead of tempfile()