Anirban166 / Autocomment-atime-results

GitHub Action that automatically comments a plot and other atime-based results on PRs
https://github.com/marketplace/actions/autocomment-atime-results
0 stars 1 forks source link

Core GA-based debugging for my setup #15

Closed Anirban166 closed 8 months ago

Anirban166 commented 8 months ago

For future reference, I think that it might be helpful to enlist the main issues I've faced in making the atime::atime_pkg method work and how I went about solving them (or in general, creating my envisioned workflow).

I figured things via several step-by-step runs on the GitHub-hosted runner, but I'll just be documenting the main roadblocks that I stumbled upon while navigating through my plan as I trace back through my logs now.

Towards the end of last week, I went with a fresh approach (discarding #7 and #8 which were based on a reference workflow that does things differently and has issues) by first cloning my fork of data.table which includes a predefined test.list in inst/atime/tests.R.

Locally, that produces results with the current development version of atime when running atime::atime_pkg("source") (where 'source' is the path to my cloned data.table repository).

But on the runner, it ran into issues with the same approach. Initially, the directory specification wasn't clear, thus giving rise to:

atime::atime_pkg("/__w/<repository>/<repository>/data.table")
Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/__w/_temp/Library/git2r/libs/git2r.so':
  libgit2.so.28: cannot open shared object file: No such file or directory
Calls: <Anonymous> -> loadNamespace -> library.dynam -> dyn.load
Execution halted
Error: Process completed with exit code 1.
Anirban166 commented 8 months ago

Before encountering that, I ran the R command .libPaths() which gave me two sources for where R packages are installed on the runner:

[1] "/__w/_temp/Library"         "/opt/R/4.3.2/lib/R/library"

Checked both of them for data.table, and also via:

(try(find.package("data.table"), silent = TRUE))
[1] "Error in find.package(\"data.table\") : \n  there is no package called ‘data.table’\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<packageNotFoundError in find.package("data.table"): there is no package called ‘data.table’>

This was after including both remotes and data.table in the list of dependencies. And while it was technically expected that a cloned repository wouldn't be recognized as an R package, it didn't have to be one for atime::atime_pkg(...) to work.

I realized the path specification was goofy so having it relative to the home directory (with ~) worked, and gave me a new error:

Error: Error in read.dcf(pkg.DESC) : cannot open the connection
Calls: <Anonymous> -> read.dcf
In addition: Warning message:
In read.dcf(pkg.DESC) :
  cannot open compressed file '/github/home/data.table/DESCRIPTION', probable reason 'No such file or directory'
Execution halted
Anirban166 commented 8 months ago

The file did indeed exist as I checked but it was empty (used cat).

So I manually added the DESCRIPTION file from my data.table fork in the repository where I'm testing the actions and replaced the one present in the cloned repository within the runner:

    - name: Clone data.table
      env:
        GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      run: |
        # sudo apt-get install libcurl4-openssl-dev
        git clone https://github.com/Anirban166/data.table.git
        echo "\nStuff inside data.table:"
        ls data.table
        echo "\ndata.table/inst/atime contents:" # verifying that it's my clone and that it has the tests
        cat data.table/inst/atime/tests.R
        cp -f DESCRIPTION data.table/DESCRIPTION
        echo "\nDescription file copied, now checking its contents:"
        cat data.table/DESCRIPTION
        # realpath data.table/DESCRIPTION
        # echo "Script executed from ${PWD}"

(I also thought the container that I'm using might be having issues - For instance, check this issue I created years ago for a libcurl installation failure with their cml container)

Stuff inside data.table:
cleanup
CODEOWNERS
configure
DESCRIPTION
GOVERNANCE.md
inst
LICENSE
Makefile
man
NAMESPACE
NEWS.0.md
NEWS.1.md
NEWS.md
_pkgdown.yml
po
R
README.md
src
tests
vignettes

data.table/inst/atime contents: 
test.list <- list(
  shallow.4440 = list(
    pkg.edit.fun=quote(function(old.Package, new.Package, sha, new.pkg.path){

      pkg_find_replace <- function(glob, FIND, REPLACE){
        atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
      }
      Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
      Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
      new.Package_ <- paste0(Package_, "_", sha)

      pkg_find_replace(
        "DESCRIPTION",
        paste0("Package:\\s+", old.Package),
        paste("Package:", new.Package))
      pkg_find_replace(
        file.path("src","Makevars.*in"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        Package_regex,
        new.Package_)
      pkg_find_replace(
        file.path("R", "onLoad.R"),
        sprintf('packageVersion\\("%s"\\)', old.Package),
        sprintf('packageVersion\\("%s"\\)', new.Package))
      pkg_find_replace(
        file.path("src", "init.c"),
        paste0("R_init_", Package_regex),
        paste0("R_init_", gsub("[.]", "_", new.Package_)))
      pkg_find_replace(
        "NAMESPACE",
        sprintf('useDynLib\\("?%s"?', Package_regex),
        paste0('useDynLib(', new.Package_))
    }),

    N = quote(10^seq(3, 8)),
    expr = quote(data.table:::`[.data.table`(dt_mod, , N := .N, by = g)),
    setup = quote({
      n <- N/100
      set.seed(1L)
      dt <- data.table(
        g = sample(seq_len(n), N, TRUE),
        x = runif(N),
        key = "g")
      dt_mod <- copy(dt)
    })
  )
)

Description file copied, now checking its contents:
Package: data.table
Version: 1.15.99
Title: Extension of `data.frame`
Depends: R (>= 3.2.0)
...

That resolved this error but while it seemed to have worked back then, the first error appeared again when running the atime code:

> library(atime) 
library(ggplot2) 
library(data.table) 
atime::atime_pkg("~/data.table")
Error in dyn.load(file, DLLpath = DLLpath, ...) : 
  unable to load shared object '/__w/_temp/Library/git2r/libs/git2r.so':
  libgit2.so.28: cannot open shared object file: No such file or directory
Calls: <Anonymous> -> loadNamespace -> library.dynam -> dyn.load
Execution halted
Error: Process completed with exit code 1.

After some research, I realized that it was definitely git2r-specific, as opposed to requiring additional git operations (such as fetch and switch as I initially thought).

Anirban166 commented 8 months ago

Thus, I proceeded to install the latest version of git2r by first cloning the GitHub repository and then building the source via make:

git clone https://github.com/ropensci/git2r.git
cd git2r && make install
Cloning into 'git2r'...
cd .. && R CMD INSTALL git2r
* installing to library ‘/__w/_temp/Library’
* installing *source* package ‘git2r’ ...
** using staged installation
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking whether the libgit2 version will work in git2r... no
configure: error: in `/__w/<repository>/<repository>/git2r':

  -----------------------------------------------------------------------

   Unable to find 'libgit2 >= 0.[26](https://github.com/Anirban166/<repository>/actions/runs/8055899390/job/22003778755#step:7:27).0' on this system, please install:
     libgit2-dev   (package on e.g. Debian and Ubuntu)
     libgit2-devel (package on e.g. Fedora, CentOS and RHEL)
     libgit2       (Homebrew package on OS X)
   and try again.

   If the libgit2 library is installed on your system but the git2r
   configuration is unable to find it, you can specify the include and
   lib path to libgit2 with:

   given you downloaded a tar-gz archive:
   R CMD INSTALL git2r-.tar.gz --configure-vars='INCLUDE_DIR=/path/to/include LIB_DIR=/path/to/lib'

   or cloned the GitHub git2r repository into a directory:
   R CMD INSTALL git2r/ --configure-vars='INCLUDE_DIR=/path/to/include LIB_DIR=/path/to/lib'

   or download and install git2r in R using
   install.packages('git2r', type='source', configure.vars='LIB_DIR=-L/path/to/libs INCLUDE_DIR=-I/path/to/headers')

   On macOS, another possibility is to let the configuration
   automatically download the libgit2 library from the Homebrew
   package manager with:

   R CMD INSTALL git2r-.tar.gz --configure-vars='autobrew=yes'
   or
configure: error: package dependency requirement 'libgit2 >= 0.26.0' could not be satisfied.
See `config.log' for more details
   R CMD INSTALL git2r/ --configure-vars='autobrew=yes'
   or
   install.packages('git2r', type='source', configure.vars='autobrew=yes')

  -----------------------------------------------------------------------

ERROR: configuration failed for package ‘git2r’
* removing ‘/__w/_temp/Library/git2r’
make: *** [Makefile:10: install] Error 1
Error: Process completed with exit code 2.

Apparently, I had to install libgit2 as well since it was not found in my runner's system.

Anirban166 commented 8 months ago

After successfully installing libgit2, everything fell into place:

Run sudo apt-get update -y
  sudo apt-get install -y libgit[2](https://github.com/Anirban166/<repository>/actions/runs/8056107366/job/22004483894#step:7:2)-dev
  git clone https://github.com/Anirban166/data.table.git
  git cl
  shell: sh -e {0}
  env:
    R_LIBS_USER: /__w/_temp/Library
    TZ: UTC
    _R_CHECK_SYSTEM_CLOCK_: FALSE
    NOT_CRAN: true
    RSPM: https://packagemanager.posit.co/cran/__linux__/focal/latest
    RENV_CONFIG_REPOS_OVERRIDE: https://packagemanager.posit.co/cran/__linux__/focal/latest
    GITHUB_PAT: ***
Hit:1 https://apt.releases.hashicorp.com focal InRelease
Hit:2 https://deb.nodesource.com/node_16.x focal InRelease
Hit:4 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:5 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:6 http://ppa.launchpad.net/git-core/ppa/ubuntu focal InRelease
Get:[3](https://github.com/Anirban166/<repository>/actions/runs/8056107366/job/22004483894#step:7:3) https://s3-us-east-2.amazonaws.com/dvc-s3-repo/deb stable InRelease [2,679 B]
Hit:7 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:8 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal InRelease
Hit:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Ign:3 https://s3-us-east-2.amazonaws.com/dvc-s3-repo/deb stable InRelease
Fetched 2,679 B in 1s (2,290 B/s)

...

Setting up libgit2-28:amd64 (0.28.4+dfsg.1-2) ...
Setting up libgit2-dev:amd64 (0.28.4+dfsg.1-2) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.12) ...
Cloning into 'data.table'...
Script executed from: /__w/<repository>/<repository>
Cloning into 'git2r'...
cd .. && R CMD INSTALL git2r
* installing to library ‘/__w/_temp/Library’
* installing *source* package ‘git2r’ ...
** using staged installation
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
Found pkg-config cflags and libs!
checking whether the libgit2 version will work in git2r... yes
checking whether the libgit2 function git_buf_dispose is available... yes
checking whether the libgit2 constant GIT_OBJECT_ANY is available... yes
checking whether the libgit2 function git_error_last is available... yes
checking whether the libgit2 function git_oid_is_zero is available... no
----- Results of the git2r package configure -----

  PKG_CFLAGS:  -DGIT2R_HAVE_BUF_DISPOSE -DGIT2R_HAVE_OBJECT_ANY -DGIT2R_HAVE_GIT_ERROR
  PKG_LIBS: -lgit2

--------------------------------------------------
configure: creating ./config.status
config.status: creating src/Makevars
** libs
using C compiler: ‘gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0’
make[1]: Entering directory '/__w/<repository>/<repository>/git2r/src'
gcc -I"/opt/R/4.3.2/lib/R/include" -DNDEBUG -DR_NO_REMAP -DSTRICT_R_HEADERS  -I/usr/local/include   -DGIT2R_HAVE_BUF_DISPOSE -DGIT2R_HAVE_OBJECT_ANY -DGIT2R_HAVE_GIT_ERROR -fpic  -g -O2  -c git2r.c -o git2r.o

...

** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (git2r)
* 
...

> remotes::install_github("tdhock/atime")
Using github PAT from envvar GITHUB_PAT
Downloading GitHub repo tdhock/atime@HEAD
Skipping 1 packages ahead of CRAN: git2r
profmem (NA -> 0.6.0) [CRAN]
bench   (NA -> 1.1.3) [CRAN]
Installing 2 packages: profmem, bench
Installing packages into ‘/__w/_temp/Library’

...

> library(atime); library(ggplot2); library(data.table); atime::atime_pkg("data.table")
* installing to library ‘/__w/_temp/Library’
* installing *source* package ‘data.table.4407a80ed9fbf24dfb5ccd71d6f8c8c8071b030d’ ...
** using staged installation
gcc 9.4.0
zlib 1.2.11 is available ok
R CMD SHLIB supports OpenMP without any extra hint
** libs
using C compiler: ‘gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0’
gcc -I"/opt/R/4.3.2/lib/R/include" -DNDEBUG   -I/usr/local/include   -fopenmp  -fpic  -g -O2  -c assign.c -o assign.o

...

* DONE (data.table.4407a80ed9fbf24dfb5ccd71d6f8c8c8071b030d)
Registered S3 methods overwritten by 'data.table.4407a80ed9fbf24dfb5ccd71d6f8c8c8071b030d':
  method                   from      
  all.equal.data.table     data.table

...

  droplevels.data.table    data.table
$shallow.4440
atime list with 10 measurements for
CRAN=1.[15](https://github.com/Anirban166/Autocomment-pr-plot-atime-action/actions/runs/8056107366/job/22004483894#step:7:15).0(N=[100](https://github.com/Anirban166/Autocomment-pr-plot-atime-action/actions/runs/8056107366/job/22004483894#step:7:101)0 to 1e+07)
HEAD=master(N=1000 to 1e+07)
Anirban166 commented 8 months ago

After getting this to work, it was fairly straightforward for me to collect the plots from the output directory data.table/inst/atime and push them onto the PR thread using the approach I documented in #5 (just note that a repo_token must be used here as opposed to the commonplace GITHUB_PAT token, which is the reason why I have it in a different step).