Rework handling of demonstration workflow

multimeric commented 1 year ago

This is currently just a WIP. Only really the branch.Rmd episode has been finished, because I wanted feedback before I changed everything. Here is an HTML render of how this page looks: branch.html.txt
Workflow plans have moved into files/plans, with one plan per file
User-facing utility functions like clean_penguin_data have moved into files/tar_functions
Then, whenever we want to show a script to a user (which might be a plan, or a utility function), we just add a file: chunk such as: https://github.com/carpentries-incubator/targets-workshop/blob/ea88ee9d3438da9ea91dbed5621140859c58d13c/episodes/branch.Rmd#L241-L245
Whenever we want to show a subset of a script to a user, we use readLines(file)[lines]: https://github.com/carpentries-incubator/targets-workshop/blob/ea88ee9d3438da9ea91dbed5621140859c58d13c/episodes/branch.Rmd#L178-L180
Finally, whenever we want to run a workflow, we create a temporary directory as a variable so we can re-use it, and then run the workflow inside of that directory using execute_plan(): https://github.com/carpentries-incubator/targets-workshop/blob/ea88ee9d3438da9ea91dbed5621140859c58d13c/episodes/branch.Rmd#L74-L82
Thus, there is only one copy of each script used throughout the repo, which is very easy to edit.
Closes #16.

github-actions[bot] commented 1 year ago

:ok: Pre-flight checks passed :smiley:

This pull request has been checked and contains no modified workflow files or spoofing.

Results of any additional workflows will appear here when they are done.

multimeric commented 1 year ago

@joelnitta

joelnitta commented 1 year ago

Thanks @multimeric for this! Sorry for the delay in responding.

I think this way of doing things in principle sounds great. Adhering to DRY should prevent problems as we modify parts of the lesson, add new content, etc.

A few comments:

It is hard to run the code in a local session because the Rmd is being knit from episodes as the root, whereas my terminal opens with the entire lesson folder as the root. Can you fix this?
Some of the content looks a bit different between the current lesson (https://carpentries-incubator.github.io/targets-workshop/branch.html) and this branch. For example, the output of example-lm-pipeline-inspect-show starts with source("R/packages.R") in the current lesson, but it look like this in your branch:
```
suppressPackageStartupMessages({
library(targets)
library(palmerpenguins)
library(tidyverse)
library(tarchetypes)
})
```
Can you change this to match the current lesson?
Another difference between your branch and the current lesson is that certain targets should be skipped since they were already built during the lesson (output of example-lm-pipeline-hide).
The following code should be shown before the output of example-lm-pipeline-inspect-show:

library(broom)
tar_load(combined_model)\
glance(combined_model)

Is it possible to use hash-pipe style chunk options throughout instead of {r, code=readLines("files/plans/plan_6.R")[20:28], eval=FALSE}? This would be helpful when we eventually translate the lesson into other languages.

multimeric commented 1 year ago

It is hard to run the code in a local session because the Rmd is being knit from episodes as the root, whereas my terminal opens with the entire lesson folder as the root. Can you fix this?

Okay, I only tested this with sandpaper::build_lesson() from the root of the repo. Can you explain how you are running it? I might have to add here or similar as a dependency to solve this.

Some of the content looks a bit different between the current lesson (https://carpentries-incubator.github.io/targets-workshop/branch.html) and this branch. For example, the output of example-lm-pipeline-inspect-show starts with source("R/packages.R") in the current lesson, but it look like this in your branch:

Okay I can change this back. Should R/packages.R contain a fixed set of packages, or should it always be magically generated for each workflow to contain all the dependencies?

Another difference between your branch and the current lesson is that certain targets should be skipped since they were already built during the lesson (output of example-lm-pipeline-hide).

Ohh, I wasn't sure why you were re-running the previous workflows in some cases but I now understand it's for this reason. I'll look into a better solution here too.

joelnitta commented 1 year ago

Can you explain how you are running it?

I have this repo as a workspace in VScode (equivalent of a project in RStudio). So when I run code interactively, it assumes the working directory is the repo root. I tried here() but it didn't play well with sandpaper. One work-around of course is just to setwd() to "episodes", but I try to avoid that function as much as possible.

Okay I can change this back. Should R/packages.R contain a fixed set of packages, or should it always be magically generated for each workflow to contain all the dependencies?

Throughout the lesson, packages get gradually added to R/packages.R. But since you don't need to actually show the real contents of that file every time, if you put it in episodes/files, it could just have all the required packages that ever get used.

Ohh, I wasn't sure why you were re-running the previous workflows in some cases but I now understand it's for this reason. I'll look into a better solution here too.

Thanks.

multimeric commented 1 year ago

I have this repo as a workspace in VScode (equivalent of a project in RStudio). So when I run code interactively, it assumes the working directory is the repo root. I tried here() but it didn't play well with sandpaper. One work-around of course is just to setwd() to "episodes", but I try to avoid that function as much as possible.

Hmm, are you using the Run Chunk button in VS Code?

It seems to work by just dumping the contents of the chunk into your current terminal, which is a bit inelegant. I note that if you set "terminal.integrated.cwd": "${workspaceFolder}/episodes" in your workspace settings, it will open the bash terminal but also the R terminal opened via "Create R Terminal" in that directory, and consequently the chunks will execute fine. Does that work well enough for you?

joelnitta commented 1 year ago

I usually just highlight the code and send it to the terminal, but as you point out that is the same as clicking "Run Chunk".

I tried setting .vscode/settings.json like this:

{
  "terminal.integrated.cwd": "${workspaceFolder}/episodes"
}

but when I run getwd() I still get this:

> getwd()
[1] "/Users/joelnitta/repos/targets-workshop"

Anyways, it's not that big of a deal. The important thing is that the lesson builds correctly.

multimeric commented 1 year ago

You will have to restart your whole terminal tab for that change to take effect, I think. But it should work.

joelnitta commented 1 year ago

Yeah I tried that, no dice. Anyways like I said it's not a big deal. We could just include a notice in the README about the working dir.

multimeric commented 1 year ago

Is it possible to use hash-pipe style chunk options throughout instead of {r, code=readLines("files/plans/plan_6.R")[20:28], eval=FALSE} ? This would be helpful when we eventually translate the lesson into other languages.

I've now done this, but I fear that the YAML-style knitr options can't be used here (with the :), and instead I have to use R-style options (with =), because I think that's the only way to get dynamic values like reading from a file as I do here.

joelnitta commented 1 year ago

I see, that's a bummer. I hope they implement it in Quarto sometime... I wonder if it's because Quarto is not specific to R? 🤔 Anyways, let's use the old-fashioned Rmd chunk options for just that one then.

multimeric commented 1 year ago

To be clear I can still use options like #| eval = FALSE, it's just that it will use the R syntax. Do you prefer that, or having them all at the top like {r, eval=FALSE}?

joelnitta commented 1 year ago

Please use the YAML style syntax (AKA the hashpipe, e.g., #| eval = FALSE), not inline like {r, eval=FALSE}

(https://www.youtube.com/watch?v=Q1TL6z2uKzE) (sorry couldn't help myself)

multimeric commented 1 year ago

One other question. I've just realised that, rather than storing all our R plans in files, we can define them in knitr chunks and then read them back out using knitr::knit_code$get("chunk-name") when we go to run the plan. This has the nice outcome of allowing you to edit the code in a single document, but also removing the redundancy I was worried about that. Do you prefer this approach?

carpentries-incubator / targets-workshop

Rework handling of demonstration workflow #17

:ok: Pre-flight checks passed :smiley: