CredibilityLab / groundhog

Reproducible R Scripts Via Date Controlled Installing & Loading of CRAN & Git Packages
https://groundhogr.com/
GNU General Public License v3.0
78 stars 4 forks source link

groundhog stuck when knitting or building website #117

Closed amacanovic closed 3 months ago

amacanovic commented 3 months ago

First of all, thanks a lot for all your work on this package!

I had absolutely no issues with groundhog until very recently. I work on a project where we analyze all the data in rmd files that are knit to form a website with code and results for the sake of reproducibility. However, recently, my knits/site builds have been getting stuck on the first chunk where I load packages using groundhog. I can knit individual files after restarting the session (sometimes a few times) and retrying until it works; but this proves impossible once I want to render the whole site, as one of the script inevitably gets stuck and there's nothing I can do to restart it without breaking the whole build.

Here is an example of what I see if I run a .rmd file chunk and it gets stuck as I describe above - it seems like an input field, but I can't do anything to interact with it. I suspect something similar happens when my knits get stuck:

image

Within the "render" tab it just looks like this, without any progress being made:

image

Here is the code:

library(groundhog)
packages_to_load <- c("readr", "dplyr", "tidyr", "PerformanceAnalytics",
                      "tidyverse", "RPostgres", "lubridate", "psych",
                      "digest", "DBI", "RODBC", "odbc", "gridExtra",
                      "panelr", "skimr", "foreach", "vegan",
                      "doParallel")

groundhog.library(packages_to_load, date = "2023-12-01")

This happens regardless of the date I choose or the packages I load. If I run the code in the console, it seems to work just fine; but knitting has been an issue. This has been happening in the last few weeks, but I am not sure why (I don't see any new releases on the changelog, and I haven't been updating the package).

I'm using R version 4.3.1, in R studio and groundhog_3.2.0 .

Would you have an idea of what's going wrong here?

urisohn commented 3 months ago

Looking at the top screenshot, you will notice the cursor is a bit to the right "| >" That's usually when groundhog is asking for user feedback, for confirmation of something, or asking you to restart R Is there somethign in your code or in knitting (I don't do that so am not familiar with how it works, just have an idea of what it does) that it may be diverting or suppressing the output on the screen? I think if you were able to see that output you would know what's happening.

amacanovic commented 3 months ago

Definitely, I also assumed it would want me to interact; but if I run the chunk from my rmd file, it gets stuck like this. I can "exit" it by hitting 'esc', but otherwise cannot input anything in the field.

Pasting the same code into the console and running it there returns no errors; all packages load just fine:

image

None of the warnings seem important, they are all just noting that certain packages were built under another R version.

urisohn commented 3 months ago

Is it possible that when you are knitting you are changing the default folder where packages would be installed?

  1. Can you run .libPaths()[1] and see whether the path is different in one case (console) vs while knitting
  2. A related problem could be that knitting leds to creating a version controlled environment through R studio, using renv or something like that so that the packages you want available are not found by groundhog.
  3. In general, when you get those prompts that are "| >" you can exit writing "uncle". Maybe try that. though, what I think must be the case is that knitting is leading pkgs to be installed in an undocumented location that groundhog cannot find.. So saying "uncle" won't be that useful
  4. Maybe doing something like ip=installed.packages() while knitting and seeing if the result is different from when you do it on the console, that could also help diagnose this. In the past I have encountered issues that arise from R Studio doing undocumented changes to the accessibility of packages (e.g., it will load into the environment pkgs that are referenced in your script before executing anything and without notifying you that has happened).
amacanovic commented 3 months ago

Thank you for your suggestions.

The paths are the same if I run .libPaths()[1] in the console or if I knit out an RMD. What confused me is that the error persisted even when I run a chunk on its own, without knitting the whole thing, but not if I run it in the console.

So when I run a chunk with the code, I go the same |> and typed "uncle", and when it exited, I got the following errors written out: image

It seems that the error would just hang when running from a chunk. Is this a familiar error? I never had any issues with loading packages with groundhog in this project before.

urisohn commented 3 months ago

Ok, so indeed R Studio is loading pkgs before you run groundhog and creating a pkg conflict, the output that's being suppressed is telling you to restart the R session, because groundhog tried to remove the old version of ggplot2 and cannot put a new one while that's still there.

Can you send me the script you have up to that point, and some sense of what you do in R studio before it? LIke, do you do File>New File>R Markdown> and then copy that code on top?

If i reproduce it I can probably fix it, but i am not familiar with knitting so will need some details of what you are doing exactly

Also, a clarification, you say runnig from console is fine but not knitting. Presumably you try knitting and it does not work, then you try console. When you go from one to the other, 1) Do you restart the R session or are you still in the same session? 2) Are you prompted by the console to restart R?

urisohn commented 3 months ago

ok, i am familiarizing myself with knittting a bit, i am guessing you don't have any code above the groundhog.library call. and what you are saying is that if you run the code by selecting the lines of code and doing CTRL-ENTER they run, but when you click the 'knit' button is whn you run into trouble. right?

urisohn commented 3 months ago

OK, i have a hypothesis. When you 'knit' you need to already have run groundhog.library() in the console or script or chunks with all packages you will use, becuase while groundhog.library() can load them when knitting, it cannot install them (because if it installs something that conflicts with a previously loaded package it asks interactively to be restarted and knitr cannot handle that interactive request it seems).

So, when I run this library(groundhog)

packages_to_load <- c("readr", "dplyr", "tidyr", "PerformanceAnalytics",
                      "tidyverse", "RPostgres", "lubridate", "psych",
                      "digest", "DBI", "RODBC", "odbc", "gridExtra",
                      "panelr", "skimr", "foreach", "vegan",
                      "doParallel")

groundhog.library(packages_to_load, date = "2023-12-01")

and then I knitted, it produced the rmd file.

I then changed the date to 2024-04-15 and again it worked if it first run the code and then knitted. But if I knitted directly, without having run groundhog to update everything to 2024-04-15 it got stuck. But then i was able to again knit it.

In other words, i think the issue is that all pkgs needs to be with the right version already installed when you press 'knit' I have a few ideas for good practices to achieve that, but first, i want to make sure the diagnosis is correct. Could you check that if you first succesfully install all pkgs then knitting works?

amacanovic commented 3 months ago

I think what you suggest makes sense; but the problem is that I have already installed all those packages before knitting.

This is what I am doing:

  1. Open R studio
  2. Select the desired project
  3. Open an existing .RMD file (all of the files I am having issues with exist, some of them are as old as having been created in March and not edited since).
  4. Run the setup chunk
  5. Run the groundhog code chunk

Alternatively, if I want to knit the whole RMD into a HTML file, I will click on "knit" in step 4.

For instance, I just opened the same RMD in two separate R sessions, and run the same chunk - so running it out of the notebook (CTRL-ENTER), and not in the console nor by clicking "knit". The only chunk preceding the groundhog package loading is the setup chunk. One session ran just fine, one threw the error I show above:

image

I then restarted the session in the second one, and now it ran just fine even when selecting CTRL-ENTER.

So I am seeing three types of problems:

  1. The error above, when running from the RMD file (when knitting, I just see the code stuck, no errors) or, sometimes, when pasting the code into the console
  2. It getting stuck just like I showed in my first post (with the "|>"), when running a chunk from the RMD file with CTRL-ENTER (when knitting, I just see the code stuck, no errors)
  3. Even though the packages have been installed through groundhog before, when loading, I get a request to restart the session (when running the code in the console)

Restarting the session when running the code from the RMD, or knitting, usually solves it eventually, and everything runs (and knits an HTML out). But this is not enough for me, since I want to compile all the HTML outputs into a website. For this, I use the "build" command that tries to build all scripts sequentially, and usually just gets stuck on one of the RMDs, so I can't restart the session for individual RMDs anymore.

E.g., here it's stuck on the first chunk indefinitely, I assume because it runs into some errors with groundhog. But, unlike for other errors I've seen when, e.g., my code is incorrect, it does not throw an error, but just hangs there: image

All the packages have been installed previously. Sorry for the confusion; I am not sure I see a clear pattern of when or why the errors are thrown.

urisohn commented 3 months ago

1) While i look into this further, can you try again: without setting the message=FALSE? 2) Do you have any calls in your code to pkgs explicitly, like pkg::function()?

amacanovic commented 3 months ago
  1. If I set message=FALSE, it indeed throws an error straight away below the chunk, as below. Makes no difference when knitting, it still gets stuck.

  2. I just realize that I did have a few instances calling the packages explicitly, in particular cowplot:: . I have removed them and rerun the code, and it now seems to work just fine. Could this have been the issue, with R loading the explicitly called packages first, and this causing a conflict with groundhog?

urisohn commented 3 months ago

Yes, that's very likely the explanation. R Studio will load packages that in the script appear as pkg:: with some randomness in how long it takes to do so, which would lead to the erratic behavior you were experiencing. Indeed, if you notice, the error messages you were getting were mentioning cowplot. So, that was probably it. If you were to include cowplot in your groundhog call, i think your code will run fine even if you include it as cowplot::, but if you do not include it, that's when you get into trouble

amacanovic commented 3 months ago

Thank you, this is very helpful. I will be mindful of this in the future, and close the issue now.