Closed zeileis closed 1 year ago
https://davidwalsh.name/get-default-branch-name suggests using git remote show and extracting the value from the output. https://usethis.r-lib.org/reference/git-default-branch.html suggests a usethis approach. Untried.
Thanks. The first requires git
to be available and the second that the repositories are actually cloned, I think. But maybe I'm missing something. A unified name would still be preferable, I think.
From the prompt:
$ curl https://api.github.com/repos/cran-task-views/Spatial/branches
[
{
"name": "jnrev",
"commit": {
"sha": "259c62603e8bbe5358211a3006b8ae040dded1ad",
"url": "https://api.github.com/repos/cran-task-views/Spatial/commits/259c62603e8bbe5358211a3006b8ae040dded1ad"
},
"protected": false
},
{
"name": "main",
"commit": {
"sha": "bf431b6e04395e818673b2467c750da268e735d0",
"url": "https://api.github.com/repos/cran-task-views/Spatial/commits/bf431b6e04395e818673b2467c750da268e735d0"
},
"protected": false
}
]
In R with curl
:
library(curl)
req <- curl_fetch_memory("https://api.github.com/repos/cran-task-views/Spatial/branches")
> jsonlite::prettify(rawToChar(req$content))
[
{
"name": "jnrev",
"commit": {
"sha": "259c62603e8bbe5358211a3006b8ae040dded1ad",
"url": "https://api.github.com/repos/cran-task-views/Spatial/commits/259c62603e8bbe5358211a3006b8ae040dded1ad"
},
"protected": false
},
{
"name": "main",
"commit": {
"sha": "bf431b6e04395e818673b2467c750da268e735d0",
"url": "https://api.github.com/repos/cran-task-views/Spatial/commits/bf431b6e04395e818673b2467c750da268e735d0"
},
"protected": false
}
]
> grep("main", rawToChar(req$content))
[1] 1
> grep("master", rawToChar(req$content))
integer(0)
> req <- curl_fetch_memory("https://api.github.com/repos/cran-task-views/HighPerformanceComputing/branches")
> grep("main", rawToChar(req$content))
integer(0)
> grep("master", rawToChar(req$content))
[1] 1
I tried with RCurl
, but do not know how to set Request forbidden by administrative rules. Please make sure your request has a User-Agent header (http://developer.github.com/v3/#user-agent-required). Check https://developer.github.com for other possible causes.
You likely want to look into setting a GITHUB_PAT with proper credits. github docs on creating a PAT.
After that the gh
package works. I keep a main helper scripts around to 'walk' through my course org to find my student projects; starting from that gives a fairly straightforward solution _given that I have a GITHUB_PAT
in my environment variables_:
res <- gh::gh("GET /orgs/:org/repos", org="cran-task-views", .limit=200) # jask for info on repos in org
# browse via `str(res[[1]])`, say
do.call(rbind, lapply(res, \(x) data.frame(name=x$name, default_branch=x$default_branch)))
and that gets us
> res <- gh::gh("GET /orgs/:org/repos", org="cran-task-views", .limit=200)
> do.call(rbind, lapply(res, \(x) data.frame(name=x$name, default_branch=x$default_branch)))
name default_branch
1 WebTechnologies main
2 ModelDeployment main
3 Hydrology main
4 Databases main
5 ctv main
6 Econometrics main
7 ctv-from-svn-2021-09 master
8 ctv-from-svn master
9 Bayesian main
10 ChemPhys main
11 Cluster main
12 Environmetrics main
13 ExperimentalDesign main
14 Finance master
15 FunctionalData main
16 GraphicalModels main
17 HighPerformanceComputing master
18 Hydrology-R-Forge main
19 MachineLearning main
20 MedicalImaging main
21 MetaAnalysis main
22 MissingData main
23 NaturalLanguageProcessing main
24 OfficialStatistics main
25 Pharmacokinetics main
26 Psychometrics main
27 ReproducibleResearch main
28 Spatial main
29 Survival main
30 TeachingStatistics main
31 TimeSeries main
32 Tracking main
33 ClinicalTrials main
34 DifferentialEquations main
35 Distributions main
36 ExtremeValue main
37 Genetics main
38 NumericalMathematics main
39 Optimization main
40 Phylogenetics main
41 Robust main
42 SpatioTemporal main
43 WebTechnologies-R-Forge main
>
Now, to just get that one file from each repo if it were me I might just fetch the repo zip archive and extract. No PAT, no pain:
> tf <- tempfile()
> download.file("https://github.com/cran-task-views/WebTechnologies/archive/refs/heads/main.zip", tf)
trying URL 'https://github.com/cran-task-views/WebTechnologies/archive/refs/heads/main.zip'
downloaded 17 KB
> unzip(tf, files="WebTechnologies-main/WebTechnologies.md")
> head(readLines("WebTechnologies-main/WebTechnologies.md"))
[1] "---"
[2] "name: WebTechnologies"
[3] "topic: Web Technologies and Services"
[4] "maintainer: Mauricio Vargas Sepulveda"
[5] "email: mavargas11@uc.cl"
[6] "version: 2022-01-23"
>
That still has the branch name in the directory but that is readable locally quite easily.
Or, of course, git clone
. We are having a discussion here about reinventing the protocol. I use something like the gh
call above to find the list of (student) repos, then filter out admin ones, and then in a first pass (== no target dir exists) to git clone
and in all subsequent runs do git pull
per repo. Perfect to update all repos prior to marking etc. Same here: the list of repos will not change often, nor will the content. So asking git
to get us changed content is more or less what git
was invented for and there is probably no good reason to reinvent it :wink:
Or you could just change from master
to main
. All of the solutions above are much more involved than the solution we have set up so far. And clearly a unified default branch name would be more transparent anyway.
Pulling would be good as long as the repositories to pull from are unchanged. But it gets more involved when we establish new task views or task views get archived. In the old R-Forge-based setup that required a person with CRAN access to change the list of task views there. However, the new script works without CRAN access.
And I'm sure that there are ways to work around that automatically as well. But I don't think that it is worth the effort just to accomodate a non-main
default branch name for some maintainers.
If you git clone
&& git pull
it is orthogonal to what the default branch is called. The file will just be there. Works for me, but preferences differ.
We have now implemented the latter solution (git clone
& git pull
).
Nevertheless I'm in favor of a simple/unified/uncontroversial main
default branch. I don't think we have seen arguments against that.
You can use HEAD
as the ref and it will correspond to whatever the default branch is for that repo.
It seems to work in the type of URL you're interested in at the top:
https://raw.githubusercontent.com/cran-task-views/TimeSeries/HEAD/TimeSeries.md
(main
is default branch)
https://raw.githubusercontent.com/cran-task-views/Finance/HEAD/Finance.md
(master
is default branch)
Nice! Thanks for the hint, Jenny.
Closing this for now, given it wasn't active for a year. Will revisit when we put together guidelines for task view maintainers.
Is there a simple way to find out what the main branch in a GitHub repository is called? I'm asking because the CRAN script we're planning to use downloads all .md files from
where
%s
is substituted by the task view name.I have used the
main
branch because some groups findmaster
offensive (due to the connection to slavery) and I findmain
the better description anyway. The only task view maintainer who objected to this is you, Dirk @eddelbuettel . Would you please reconsider your decision or provide some way how we can determine the name of the main/master branch?