Open zkamvar opened 1 year ago
We indeed use gh
to get the available releases (especially the latest) and their download URL from API
https://github.com/cderv/pandoc/blob/638f1c694ee286d253fcd469b5312cfcb5d6316e/R/pandoc-install.R#L278-L288
I don't think the github API can be access with SSH only. So I am not sure how to offer a fallback when no Github Token.
Is setting a Github PAT too complicated and not possible here ?
I can add some guidance in doc or error message about how to setup a PAT. Like pointing toward usethis helpers: https://usethis.r-lib.org/articles/git-credentials.html#get-a-personal-access-token-pat
This PAT think with Github is one of the reason the recommandation from usethis is to use https and not ssh : https://usethis.r-lib.org/articles/git-credentials.html#https-vs-ssh
If you authenticate via SSH for “regular” Git work, you still have to set up a PAT for work that uses the REST API.
Theoretically there could be some solution (like scraping a release page using a url), but honestly using the Github API is the most reliable and easiest to maintain.
Internally there is a caching mechanism per session to store the queried the database, so shipping a cached version inside the package could be possible but
So not ideal.
Another idea would be to have free fallback mode where you pass a version number and we try()
to download without checking for availability and URL validation querying the API first.
Not ideal either.
Thank you for the detailed response, Chris!
Is setting a Github PAT too complicated and not possible here ?
In some ways, yes. Two years ago, there was a big discussion in The Carpentries community around the use of SSH vs PAT in teaching novices, and the community decided that SSH offered fewer complications (see https://github.com/swcarpentry/git-novice/issues/778#issuecomment-880770975 for an example of the cognative load offered by SSH vs PAT). I have even written guidance myself for this, but find that it's one of those things that people just find too complicated: https://carpentries.github.io/sandpaper-docs/github-pat.html
My goal was to be able to use the {pandoc} package as a method for helping people bootstrap The Carpentries Workbench directly from R.
I can add some guidance in doc or error message about how to setup a PAT. Like pointing toward usethis helpers: https://usethis.r-lib.org/articles/git-credentials.html#get-a-personal-access-token-pat
This PAT think with Github is one of the reason the recommandation from usethis is to use https and not ssh : https://usethis.r-lib.org/articles/git-credentials.html#https-vs-ssh
If you authenticate via SSH for “regular” Git work, you still have to set up a PAT for work that uses the REST API.
I think this is a very sensible solution.
Another idea would be to have free fallback mode where you pass a version number and we try() to download without checking for availability and URL validation querying the API first.
This is the method I was thinking about. It's not ideal or perfect, but at least it gives people the ability to install pandoc from a known version number.
Another method is to set up a GitHub account like https://github.com/testingjerry (as used in the R-universe docker container) and generate a few permission-less PATs, obfuscate them, and randomly select one when someone does not have a PAT set up
Another method could be to continously cache the database of pandoc release tags in this repository as a text file and then have a function that would locally store and update that text file on the user's machine in the appdir folder (which is a kind of generic functionality I'm hoping someone has already solved). This way, if the user does not have the PAT, you could query the local text file and then fetch the version. Again, not ideal because of lag and need for the user to run the function that updates the file, but it's better than the try()
and hope.
I don't think the github API can be access with SSH only. So I am not sure how to offer a fallback when no Github Token.
Without a Github PAT, I think we can still access the Github API. The only problem is that the rate limit (per IP) will be stricter, e.g., this is how to list the tags in the jgm/pandoc
repo without a Github token:
xfun::github_releases('jgm/pandoc', pattern = '^[0-9.]+$')
# lower-level call
xfun:::rest_api_raw('https://api.github.com', '/repos/jgm/pandoc/tags', '', list(per_page = 100, page = 1))
Installing Pandoc from Github releases is very similar to installing Hugo in blogdown. For blogdown, I don't require Github PAT or SSH configurations (if a PAT is provided, it will be used). You can take a look at the source code of blogdown::hugo_installers
, which fetches the download links of all release assets. I can factor out some of its code into a function that can be reused by other developers.
Thanks both for your inputs! It seems we can do without a PAT after all. I'll have a look at this.
the community decided that SSH offered fewer complications
My two cents: As a Windows user, SSH was always a bit complicated to setup or used, as not that common in the Windows world. 😅
My two cents: As a Windows user, SSH was always a bit complicated to setup or used, as not that common in the Windows world.
Honestly, I'm on your side of this, but the people have spoken and the cries for the old ways combined with the fear of megacorporations controlling everything won out since this is one of those processes where you set something once and you are done 🤷🏼
Thank you for considering the option!
@zkamvar would it be possible to share the exact error message that was encountered ?
Looking into this, I believe our current solution using gh::gh()
works also with no token set
This means that
gh::gh(
"GET /repos/:owner/:repo/releases",
owner = "jgm",
repo = "pandoc",
.progress = FALSE
)
should work even if no GITHUB_PAT or GITHUB_TOKEN env var set
Otherwise, using
gh::gh(
"GET /repos/:owner/:repo/releases",
owner = "jgm",
repo = "pandoc",
.progress = FALSE,
.token = ""
)
Insure no token use.
I am wondering if there is something happening with the fact that SSH if configured for GIT, or if the error is linked to API rate limitation anyway.
Thank you !
Ah good point. 😓 It turns out it was a 401 response; I think it genuinely was bad credentials. I seem to remember GitHub throwing this response during some operations when there were no credentials required.
Error in gh_process_response(raw) :
GitHub API error (401):
Message: Bad credentials
Read more at https://docs.github.com/rest
If I create a PR with a tryCatch for the 401 error to re-attempt without a token, would that be acceptable?
Bad credential means potentially a wrong PAT string is being passed.
You get this from the above gh::gh()
example I shared ? With both of them including the one with .token = ""
?
For the /releases
endpoint, it works without credentials, and I get this when rate limit is reached (60 max when no PAT)
> gh::gh(
+ "GET /repos/:owner/:repo/releases",
+ owner = "jgm",
+ repo = "pandoc",
+ .progress = FALSE,
+ .token = ""
+ )
Error in `gh::gh()`:
! GitHub API error (403): API rate limit exceeded for
146.70.194.99. (But here's the good news: Authenticated requests get
a higher rate limit. Check out the documentation for more details.)
ℹ Read more at
<https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting>
Run `rlang::last_trace()` to see where the error occurred.
So 403 error, not 401.
So I am curious with which call exactly you get this error, so that we could see why you get that.
If you want to check token, gh::gh_token()
will show you what is seen by gh. Should be ""
if you have really no PAT used.
Internally gh is using https://gitcreds.r-lib.org/ and gitcreds::gitcreds_get()
is the function used.
Maybe the SSH credentials are messing up the logic for gh which expect a PAT. 🤷
If I had more information about the state of the token and ssh, I would gladly give it, but the only information I had was a screenshot from the user who has since updated their computer. The command that generated the error was pandoc::pandoc_install("2.19.2")
I did test the response with the empty token and I was able to successfully get the 200 status. The reason I suggest a 401 fallback is because for a lot of users, juggling tokens is difficult and they end up with a situation where they have an invalid token in their environment (or in their git setup) and no clue of how to remove it.
I think that people should be able to download pandoc, even if they have bad credentials. Yes, it's not a great state for them to be in, but that's not the problem this package is trying to solve. The implementation would look something like this:
res <- tryCatch(
expr = {
gh::gh(
"GET /repos/:owner/:repo/releases",
owner = "jgm",
repo = "pandoc",
.progress = FALSE,
.token = "nope" # simulate bad credentials
)
},
http_error_401 = function(e) {
message(e$message) # show that the message is from a 401 call
message("Bad credentials found; downloading without credentials")
gh::gh(
"GET /repos/:owner/:repo/releases",
owner = "jgm",
repo = "pandoc",
.progress = FALSE,
.token = "" # fallback from bad credentials:
)
}
)
#> GitHub API error (401): Bad credentials
#> Bad credentials found; downloading without credentials
class(res)
#> [1] "gh_response" "list"
length(res)
#> [1] 30
Created on 2023-08-24 with reprex v2.0.2
Thank you for your patience with this issue!
The reason I suggest a 401 fallback is because for a lot of users, juggling tokens is difficult and they end up with a situation where they have an invalid token in their environment (or in their git setup) and no clue of how to remove it.
I understand better what you mean ! I can probably do that yes. I don't how many users would it this, but it seems easy enough to do. Thanks for the idea!
I tried to get a colleague to use this package, but she used SSH verification for Git and does not have a PAT. She got an error that said that she needed a PAT to continue. I suspect it comes from the {gh} package, but I was wondering if there was a way to have a fallback when no GitHub token is avaialable?