carpentries / workbench

Repository for Discussions and Materials about The Carpentries Workbench
https://carpentries.github.io/workbench/
Creative Commons Attribution 4.0 International
17 stars 7 forks source link

child documents paths as R code cause validation error #74

Open trhallam opened 10 months ago

trhallam commented 10 months ago

Hi,

I'm having some issues with a more recent version of sandpaper where the deploy step is failing in Github.

Local builds of the site build without issue.

The problem:

I use r eval statemets within Rmd documents to dynamically include content depending on a configuration for the lesson I want to publish. This was working previously but there is now an issue I think in the way the deploy function finds and copies content. There don't appear to be any logs or useful output in the Github actions build process because the R output is limited in the deploy step.

I have created a MWE that demonstrates the failing build process.

https://github.com/trhallam/workbench-mwe/actions/runs/6392622732/job/17350256259

zkamvar commented 9 months ago

Hi @trhallam,

This is coming from {pegboard} and it's failing during the step that attempts to validate the lesson while gathering child documents.

Situation

The following R Markdown context cannot be parsed by {pegboard} without evaluating any of the code.

```{r load_config, include=FALSE}
snippets <- paste('files/snippets/', 'TEST1', sep='')

Working include:

Not-working include:


One of the things to know about pegboard is that it parses the structure of the document, but it does not evaluate any of the code, which is why the error reads:

Error in initialize(...) : the file '/home/runner/work/workbench-mwe/workbench-mwe/paste(snippets, /test-include.Rmd, sep = )' does not exist Calls: ... load_children -> read_children -> -> initialize


That being said, even if we _did_ evaluate the code supplied to the child parameter, we would still find ourselves in the same situation because the `snippets` object is defined in a separate chunk, which {pegboard} does not know about because it does not evaluate code inside the chunks:

``` r
tmp <- tempfile()
writeLines("```{r snip, child=file.path(snippet, 'test.txt')}\n```", tmp)
ep <- pegboard::Episode$new(tmp)
# this is the document we are working with ----------
ep$show()
#> ```{r snip, child=file.path(snippet, "test.txt")}
#> ```

# grab the child document expression ----------------
print(child <- xml2::xml_attr(ep$code, "child"))
#> [1] "file.path(snippet, \"test.txt\")"

# if we parse and evaluate it, it will fail ---------
print(parse(text = child))
#> expression(file.path(snippet, "test.txt"))
eval(parse(text = child))
#> Error in eval(parse(text = child)): object 'snippet' not found

Created on 2023-10-03 with reprex v2.0.2

Potential solutions

IIRC, you are looking to find a way to address https://github.com/carpentries/sandpaper/issues/81, which is important for the HPC group. I'm thinking there may be two ways of going about it, but knowing that we need to engineer the solution and open the pathways for contribution in {sandpaper} or {pegboard} before it's possible. I do want to set expectations right now and say that I am currently working part time on medical leave and may not be able to address this quickly, but I will do my best to present all the information I know.

Option One: parameterize and eval

One thing you could do at the moment, which is a bit wonky, but it will work in any situation is to duplicate snippets and use eval = snippet == <THING>. This way, {pegboard} will be able to easily track the child documents.

```{r setup, include = FALSE}
snippet <- "TEST1"

This will only load one snippet file depending on the value of snippet:


The drawback, of course is that the snippets are inherently tied to the parent documents and are a bit difficult to wrangle and add content

### Option Two: Parameterized reports

This is much closer to the spirit of https://github.com/carpentries/sandpaper/issues/81 where you could specify a file called `params.yaml`

```yaml
snippet: "TEST1"

and then in your R Markdown, you would be able to use this pattern without the need for the setup chunk.


```{r, child=file.path(params$snippets, 'test-include.Rmd'), eval=TRUE}

The caveat to this approach is that it will take a bit of engineering inside of both {sandpaper} and {pegboard} to make the params available since each R Markdown document is considered to run independently:

``` r 
tmp <- tempfile()
writeLines("```{r snip, child=file.path(params$snippet, 'test.txt')}\n```", tmp)
ep <- pegboard::Episode$new(tmp)
ep$show()
#> ```{r snip, child=file.path(params$snippet, "test.txt")}
#> ```
# If we assume that the params are available, the child document will evaluate correctly
params <- list(snippet = "files/TEST1")

# grab the child document expression ----------------
print(child <- xml2::xml_attr(ep$code, "child"))
#> [1] "file.path(snippet, \"test.txt\")"
eval(parse(text = child))
#> [1] "files/TEST1/test.txt"

Created on 2023-10-03 with reprex v2.0.2

I believe this is the correct way to go because it means that you can continue to use the same pattern you have been using (you can also replace the {{ site.param }} syntax with `r params$param`). The only thing I would be cautious about would be allowing the fields in params.yaml to evaluate as R code (which is happily avoided).

Setting Expectations

I want to reiterate, with several factors going on, adding this feature is going to take some time to implement. I will do my best to coach @carpentries/workbench-maintainers to tackle this, but please be patient.

zkamvar commented 9 months ago

Local builds of the site build without issue.

FWIW, I believe the local build works without issue because you have an old version of {pegboard} on your machine (as I just released the new version yesterday). I can add a patch that adds a catch for the child document processing for validation so that you can get back to a working state. The only caveat (and I believe this was true before the change) is that changes to the child documents in your case will not rebuild the parent document.

trhallam commented 9 months ago

Thanks for the quick clarification @zkamvar , no expectations here and I'm not an R expert, so which pathway is more appropriate is fine by me. Appreciate you looking into this.

Indeed the need for this comes from our versions of the HPC carpentry which has a lot of system specific configuration in it, where the lessons may change depending upon where or to who you are teaching. It would be appreciated if the quick patch could be added for now. I can keep in mind that a full site rebuild may be needed if things aren't refreshing appropriately during development.

trhallam commented 9 months ago

Option Two: Parameterized reports

In terms of the parameterized reports, I have loaded a yaml file in the pre-amble of each lesson using.

```{r load_config, include=FALSE}
library(yaml)
config <- yaml.load_file("lesson_config.yaml")
snippets <- paste('files/snippets/', config$snippets, sep='')```

Link to YAML

https://github.com/EPCCed/2023-06-28-uoe-hpcintro/blob/main/episodes/lesson_config.yaml

zkamvar commented 9 months ago

Thank you for the full context. This will really help understand what the requirements are. I have time to work on https://github.com/carpentries/pegboard/pull/140 and it should be released later today.

zkamvar commented 9 months ago

I just pushed the fix and it should be available in about an hour. I have confirmed that it works in our integration test: https://github.com/carpentries/workbench-integration-test/actions/runs/6398719182/job/17369438597

The error is an unrelated one that I will open a pull request to fix.

froggleston commented 1 month ago

@trhallam Could you confirm this is fixed? We could then close this issue :)

trhallam commented 1 month ago

Thanks for following up. The sites do build and deploy correctly to Gh Pages, but I still get a warning the "Deploy Site" steps.

Do I need to modify the approach?

https://github.com/EPCCed/2024-06-20-hpc-intro-shampton/actions/runs/9480863700/job/26122442483

zkamvar commented 1 month ago

I saw this in my notifications and felt that I should add a summary:

The underlying cause of this is not fixed. What was fixed in https://github.com/carpentries/pegboard/pull/140 was the lesson failing to build if a child document could not be found (for the ecologists in the room this is the r-strategist mode).

This will continue to be a problem until allowing paramterized reports via a yaml file is implemented (see https://github.com/carpentries/workbench/issues/74#issuecomment-1745120003 for a breakdown).

The reason this approach is not working is because pegboard doesn't know anything about the code inside the document, including the setup chunk, so it doesn't know what snippets means.

Note that the solution to this this is NOT to run the code in the setup chunk in {pegboard}... that would cause so many lessons to slow down so much. I see the solution in the following ways (summarized from https://github.com/carpentries/workbench/issues/74#issuecomment-1745120003):

  1. allow for a parameters yaml file for lesson authors to swap out different parameters of their lesson in both {pegboard} and {sandpaper} (this can be a new item in config.yaml called params: <path-to-config.yaml> so that authors can specify a custom file)
  2. read in the params via {pegboard} and have them available under params$
  3. instruct authors to specify the path to the snippets folder from the root of the lesson (e.g. snippets: files/snippets/EPCC_ARCHER2_slurm)
  4. have authors use child=file.path(params$snippets, "name-of-snippet") to specify the snippet in the chunk.