Closed chainsawriot closed 1 year ago
I think it is best to find the closest commit to snapshot_date
automatically because not everybody will know what these random letters/numbers mean and where to get them. Here is a suggestion to obtain the closest commit sha via gh
get_sha <- function(repo,date){
commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100)
dates <- sapply(commits,function(x) x$commit$committer$date)
idx <- which(dates<=date)[1]
k <- 2
while(is.null(idx)){
commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100,page = k)
k <- k + 1
}
commits[[idx]]$sha
}
repo <- "schochastics/netUtils"
date <- as.Date("2020-08-26")
get_sha(repo,date)
#> [1] "5e2f3ab53452f140312689da02d871ad58a96867"
Created on 2023-02-07 with reprex v2.0.2
Probably still error prone paging
Just found pkgdepends, might be helpful to get dependencies of github only packages?
library(pkgdepends)
pd <- new_pkg_deps("schochastics/levelnet@775cf5e")
pd$solve()
#> ! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.
#> ℹ Loading metadata database
#> ✔ Loading metadata database ... done
#>
pd$draw()
#> schochastics/levelnet@775cf5e 0.5.0 [new][bld][cmp][dl] (unknown size)
#> ├─igraph 1.3.5 [new][bld][cmp][dl] (2.50 MB)
#> │ ├─magrittr 2.0.3 [new][bld][cmp][dl] (267.07 kB)
#> │ ├─Matrix 1.5-1 < 1.5-3 [old]
#> │ │ └─lattice 0.20-45
#> │ ├─pkgconfig 2.0.3 [new][bld][dl] (6.08 kB)
#> │ └─rlang 1.0.6 [new][bld][cmp][dl] (742.51 kB)
#> ├─Matrix
#> └─Rcpp 1.0.10 [new][bld][cmp][dl] (2.94 MB)
#>
#> Key: [new] new | [old] outdated | [dl] download | [bld] build | [cmp] compile
Created on 2023-02-07 with reprex v2.0.2 Apologies if this is irrelevant but I am still not that familiar with the code base of gran :)
A way without pkgdepends could be this one:
get_sha <- function(repo,date){
commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100)
dates <- sapply(commits,function(x) x$commit$committer$date)
idx <- which(dates<=date)[1]
k <- 2
while(is.null(idx)){
commits <- gh::gh(paste0("GET /repos/",repo,"/commits"),per_page = 100,page = k)
k <- k + 1
}
list(sha = commits[[idx]]$sha,x_pubdate = dates[[idx]])
}
repo <- "schochastics/netUtils"
snapshot_date <- "2020-08-26"
snapshot_date <- anytime::anytime(snapshot_date, tz = "UTC", asUTC = TRUE)
sha <- get_sha(repo,snapshot_date)
repo_descr <- gh::gh(paste0("GET /repos/",repo,"/contents/DESCRIPTION"),ref=sha$sha)
descr_df <- as.data.frame(read.dcf(url(repo_descr$download_url)))
descr_df
#> Package Title Version
#> 1 igraphUtils A Collection of Network Analytic Functions 0.1.0.9000
#> Authors@R
#> 1 person(given = "David",\nfamily = "Schoch",\nrole = c("aut", "cre"),\nemail = "david.schoch@manchester.ac.uk")
#> Description
#> 1 Provides a collection of network analytic functions that may not deserve a package on their own.
#> License Encoding LazyData Roxygen RoxygenNote
#> 1 MIT + file LICENSE UTF-8 true list(markdown = TRUE) 7.1.0
#> LinkingTo Imports
#> 1 Rcpp,\nRcppArmadillo Rcpp,\nigraph
No additional dependencies except gh which one probably needs anyway but we need to parse the Description field ourselves Created on 2023-02-08 with reprex v2.0.2
@schochastics So now you are a CTB.
I thought about using pkgdepends
previously (#1), but decided not using it because pkgsearch::cran_package_history
provides enough information (for CRAN packages).
In the long run, I think we might be better off using pkgdepends
(because it supports bioc
etc.). Also, opening up gran
to Github also means opening up to DESCRIPTION
fields such as Remotes
. And pkgdepends
support these.
For now, I will take your get_sha
and read.dcf
approach.
Tag as v0.1 for now. Dunno if it can be made.
@schochastics So now you are a CTB.
:)
For now, I will take your
get_sha
andread.dcf
approach.
Do you want to take over integrating this into the package? I otherwise give it a shot
@schochastics Please give it a shot (and be AUT)!
I have a working version in my fork in the gh branch. The problem are system requirements. Not sure we can get this reliably from the DESCRIPTION example: igraph DESCRIPTION:
SystemRequirements:
gmp (optional),
libxml2 (optional),
glpk (>= 4.57, optional)
R> remotes::system_requirements(package = "igraph",os="ubuntu",os_release="20.04")
[1] "apt-get install -y libglpk-dev" "apt-get install -y libgmp3-dev"
[3] "apt-get install -y libxml2-dev"
@schochastics For now, an interim solution is to put the names of non-cran packages in a special slot inside the granlist
object (e.g. output$noncran_pkgs
, I don't want to call it gh_pkgs
because we might need to include bioc or even local packages in the future). And probably, those non-cran packages would only be in output$grans[[x]]$original
(but not output$grans[[x]]$deps
, if we don't support those nonstandard DESCRIPTION fields for now). Those non-cran packages need special treatment anyway for export_granlist
(probably they will be needed to install the last, preferably being cached).
When getting Sysreqs, the packages in noncran_pkgs
need to be separated from CRAN packages. For CRAN packages, do the usual remotes::system_requirements
thing.
For gh packages, we need to get their DESCRIPTION again (or if we can, cache the DESCRIPTION file from the previous step) and do this:
Thanks I'll try to get this done.
Different question: how would you indicate a gh package when providing a list of packages to resolve? My current implementation is to interpret everything with a "/" as coming from github
resolve(c("rtoot","schochastics/rtoot"))
calls .get_snapshot_dependencies_cran()
for rtoot
and .get_snapshot_dependencies_gh()
for schochastics/rtoot
. Not sure if this is the best way, but certainly the simplest?
@schochastics
Slash is fine.
devtools
<= 1.5 should use repo
and username
separately (as older versions, e.g. version < 1, only support username
). While > 1.5 (which is 1.6.1 onwards, i.e. snapshot_date
>= 2014-10-07) should use username/repo
because username
is deprecated.
ah I remember that change! I can fix that.
Archiving GH packages
https://api.github.com/repos/schochastics/rtoot/tarball/50420ed
And then R CMD BUILD it?
I guess that way we could get around the devtool issue?
One can install that tarball directly?!?!
R> install.packages("~/Downloads/schochastics-rtoot-v0.2.0-11-g50420ed.tar.gz")
Installing package into ‘/home/david/R/x86_64-pc-linux-gnu-library/4.2’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from 'pkgs'
Warning in untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
* installing *source* package ‘rtoot’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (rtoot)
Something tells me that this is probably not a good idea
@schochastics Yes. The R CMD BUILD
step afterwards is simply for getting rid of the unnecessary files as specified in .RBuildignore
, build vignette and checking and all those sundries. It is not really necessary for many (well-developed) packages such as rtoot.
My proposal above was mainly for dealing with the cache
option of dockerize
. But if it can be generalized and avoid the need for devtools/remotes
just for the install_github
that would be tremendously helpful.
And this is the super hacky version of devtools::install_github
without any dependency for inside the container.
(R has a function for untarring, but it is super buggy before R 4. I have direct bad experience about it.)
(This also inspires me that the limitation of R > 2.1 #14 can be eliminated by doing stupid thing like: system(command = paste("R CMD INSTALL", tarball_path, sep = ""))
. Again, system2
is nicer but is a recent phenomenon.)
pkg <- "schochastics/rtoot"
sha <- "50420ed"
x <- tempfile(fileext = ".tar.gz")
y <- tempdir(check = TRUE)
download.file(paste("https://api.github.com/repos/", pkg, "/tarball/", sha, sep = ""), destfile = x) ## one concern is that Woody can't do proper https authentication; but actually http works as well
system(command = paste("tar", "-zxf ", x, "-C", y))
system(command = paste("R", "CMD", "build", list.dirs(path = y, recursive = FALSE))) # There can be multiple directories if y is reused.
## TODO: Need a way to generate `tarball_path`
tarball_path <- "rtoot_0.2.0.9000.tar.gz"
install.packages(tarball_path, repos = NULL)
unlink(tarball_path)
It also brings us to another issue: Should we store x
, x_version
as usual for GH packages, i.e. package name and version as per DESCRIPTION? So that we can generate tarball_path
as usual. It is also beneficial for cases such as igraphUtils
/ netUtils
.
We can store x
, "schochastics/rtoot" and sha
somewhere else, e.g. my suggestion: cranlist$noncran_pkgs
as a vector/dataframe.
## `type` can be extended to "bioc", "local"
## handle can be github path, local path, or bioc package name
## local probably doesn't need ref, bioc might store version as ref.
data.frame(x = c("rtoot", "igraphUtils"), type = c("github", "github"), handle = c("schochastics/rtoot", "schochastics/netUtils"), ref = c("50420ed", "5e2f3ab"))
Another way is to stay as it is now and look at the DESCRIPTION once again in y
to to get the ACTUAL name and version during the container building time.
Just finished almost exactly the same hack-ish solution.
If we can avoid certain issues with system
, why not use it? The "R CMD" stuff is probably the most stable thing we have?
one could get the tar file created like this, but no idea how stable this really is:
res <- system(command = paste("R", "CMD", "build", list.dirs(path = y, recursive = FALSE)),intern = TRUE)
tar_file_line <- res[grepl("*.tar.gz",res)]
tar_file_line
# regex to extract the tar.gz file
I will work on this a bit more. devtools is a bit of a pain with its dependencies
We can store x, "schochastics/rtoot" and sha somewhere else, e.g. my suggestion: cranlist$noncran_pkgs as a vector/dataframe.
I will toy around with this, but I noticed that it becomes complicated quickly to drag along everything. The sha
is enough to recreate what we need, though maybe in a cumbersome way.
Creating that dataframe might however be helpful just as a reference
This can deal with pkg renaming. Obviously still needs some error handling
pkg <- "schochastics/igraphUtils"
sha <- "1b601a3"
x <- tempfile(fileext = ".tar.gz")
y <- tempdir(check = TRUE)
download.file(paste("https://api.github.com/repos/", pkg, "/tarball/", sha, sep = ""), destfile = x)
system(command = paste("tar", "-zxf ", x, "-C", y))
dlist <- list.dirs(path = y, recursive = FALSE)
pkg_dir <- dlist[grepl(sha, dlist)] # the sha allows to identify the dir uniquely
res <- system(command = paste("cd ", y, " && R", "CMD", "build", pkg_dir), intern = TRUE)
tar_file_line <- res[grepl("*.tar.gz", res)]
flist <- list.files(y, pattern = "tar.gz", recursive = FALSE)
tarball_path <- paste0(y, "/", flist[vapply(flist, function(x) any(grepl(x, res)), logical(1))])
install.packages(tarball_path, repos = NULL)
unlink(tarball_path)
So, let's make it like that in header.R
for now.
We don't need a lot of error handling in the container building part. It's better to err when things go wrong there.
5291eae
This is the "for the sake of argument" test case:
x <- resolve("cran/sna", "2005-05-01")
It will generate the earliest supported version of R (2.1.0) but with Github.
This happens in the v0.1 branch when dockerizing the above
FROM debian/eol:
ENV TZ UTC
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && apt-get update -qq && apt-get install wget locales build-essential r-base-dev -y
COPY rang.R ./rang.R
COPY compile_r.sh ./compile_r.sh
RUN apt-get update -qq && apt-get install -y libfreetype6-dev libgl1-mesa-dev libglu1-mesa-dev libicu-dev libpng-dev make pandoc zlib1g-dev
RUN bash compile_r.sh 2.1.0
CMD ["R"]
Sending build context to Docker daemon 10.75kB
Step 1/8 : FROM debian/eol:
invalid reference format
looks like debian_version
is missing
The github download is not possible inside Woody. We need to warn the users and ask them to cache
instead.
And really old packages can't be built on modern R, e.g. cran/sna
.
Need to download them now, transfer them inside the container, and built there instead. (So complicated...)
I think this is done.
Or even to find the closet commit to
snapshot_date
. But gran needs to be able to emitgran
object for a github package.