degauss-org / dht

DeGAUSS helper tools
http://degauss.org/dht
GNU General Public License v3.0
4 stars 1 forks source link

degauss_run cannot use cache or downloaded files #85

Open cole-brokamp opened 2 years ago

cole-brokamp commented 2 years ago
erikarasnick commented 1 year ago

degauss_run works well with {targets} but only caches the target objects and re-runs any function that depends on a target that changed upstream... so doesn't help with the s3_downloads folder problem (e.g., address file -> geocoded -> narr pipeline, if one address changes, the entire pipeline would re-run and narr s3 chunks would not be cached due to temp folder associated with degauss_run)

cole-brokamp commented 1 year ago

Based on our discussion today, it sounds like the best path forward will be to not spend time/energy on fixing this now with {targets} or anything more advanced.

Instead, add an example in the vignette that shows how to use https://memoise.r-lib.org/ to speed up repeat calculations. Something like:

fc <- memoise::cache_filesystem(fs::path(fs::path_wd(), "data-raw"))
degauss_run <- memoise::memoise(degauss_run, omit_args = c("quiet"), cache = fc)

Make sure to add a note explaining how/why caching is not expected to work normally with DeGAUSS when using DeGAUSS from R with dht::degauss_run

cole-brokamp commented 1 year ago
cole-brokamp commented 1 year ago

In container,

options(s3.download_folder = "/s3_downloads")

and on call to container, find working directory and bind mount to /s3_downloads

glue::glue("-v {getwd()}:/s3_downloads}") 
cole-brokamp commented 1 year ago

create intermediate csv files, caches, and run docker from tools::R_user_dir("dht", "cache") instead of tempfile()