elliewhite-usgs / ds-pipelines-targets-1

[USGS] training
https://lab.github.com/USGS-R/intro-to-targets-pipelines
0 stars 0 forks source link

Why use a dependency manager? #5

Closed github-learning-lab[bot] closed 3 years ago

github-learning-lab[bot] commented 3 years ago

We're asking everyone to invest in the concepts of reproducibility and efficiency of reproducibility, both of which are enabled via dependency management systems such as remake, scipiper, drake, and targets.

Background

We hope that the case for reproducibility is clear - we work for a science agency, and science that can't be reproduced does little to advance knowledge or trust.

But, the investment in efficiency of reproducibility is harder to boil down into a zingy one-liner. Many of us have embraced this need because we have been bitten by issues in our real-world collaborations, and found that data science practices and a reproducibility culture offer great solutions. Karl Broman is an advocate for reproducibility in science and is faculty at UW Madison. He has given many talks on the subject and we're going to ask you to watch part of one of them so you can be exposed to some of Karl's science challenges and solutions. Karl will be talking about GNU make, which is the inspiration for almost every modern dependency tool that we can think of. Click on the image to kick off the video.

reproducible workflows with make

:computer: Activity: Watch the above video on make and reproducible workflows up until the 11 minute mark (you are welcome to watch more)

Use a GitHub comment on this issue to let us know what you thought was interesting about these pipeline concepts using no more than 300 words.


I'll respond once I spot your comment (refresh if you don't hear from me right away).

elliewhite-usgs commented 3 years ago

I found the distinction between reproducible and replicable/correct interesting. I think reproducible is the first step and the least we could do to then encourage replicability and find the errors if there are any. I also like how I've thought about the steps to reproducibility informally when I've frustrated myself and never connected them under the umbrella of reproducibility. Really nice talk.

elliewhite-usgs commented 3 years ago

Why are you stuck, bot. Commenting to see if I can trick you.

lindsayplatt commented 3 years ago

It accidentally commented over on #4 because you commented after the merge (which is totally fine!). Pasting that content here and will let you continue :)

Great comments @whiteellie! :sparkles:

You could consider GNU make to be a great grandparent of the packages we referred to early in this lesson (remake, scipiper, drake, and targets). Will Landau, the lead developer of targets, has added a lot of useful features to dependency management systems in R, and has a great way of summarizing why we put energy into using these tools: "Skip the work you don't need"

We'd like you to next check out a short part of Will's video on targets

reproducible workflows with R targets

:tv: Activity: watch video on targets from at least 7:20 to 11:05 (you are welcome to watch the full talk if you'd like)

Use a github comment on this issue to let us know what contrasts you identified between solutions in make and what is offered in R-specific tools, like targets. Please use less than 300 words. Then assign your onboarding cohort team member this issue to read what you wrote and respond with any questions or comments.


When you are satisfied with the discussion, you can close this issue and I'll direct you to the next one.

elliewhite-usgs commented 3 years ago

I'm not sure Will Landau talked much about make, and he presents targets as a "make-like" tool. But he says that targets is fundamentally designed for R, likes functions, files are R objects in this world, and apparently is better than drake. But other than it being in the R world, I'm not sure if make is much different in requiring functions and handling files as objects (?)

lindsayplatt commented 3 years ago

You're right - I don't think he touched on make in that video, but I think Karl Broman's video gives some ideas about what it is all about. To me, the biggest difference and advantage is just what you said - targets is fundamentally designed for R. You don't need to learn a whole other language in order to write a robust pipeline and that is HUGE when it comes to sharing our code with others outside our group and asking others in USGS to adopt pipelining practices. Asking someone to learn a new package is not nearly as big of an ask as learning a new language!

github-learning-lab[bot] commented 3 years ago


When you are done poking around, check out the next issue.