Closed j-i-l closed 1 month ago
aaand we get again a 403 on a url that is digested just fine in a browser: https://github.com/furrer-lab/abn/actions/runs/8701967813/job/23865029772#step:5:60
The url in question:
https://onlinelibrary.wiley.com/doi/10.1111/tbed.13672
With curl -I
I also get HTTP/2 403
but in a browser the url is valid.
The url for citation is https://doi.org/10.1111/tbed.13672
which gives a 302 when requesting with curl -I
, which would be picked up by urlcheck also.
My conclusion is that we cannot really use urlcheck as is. We either should ignore redirects from the domain doi.org or ignore 403's. The former seems the cleanest solution, however, CRAN will not like our doi.org/... urls then.
@matteodelucchi what do you think about this?
aaand we get again a 403 on a url that is digested just fine in a browser: https://github.com/furrer-lab/abn/actions/runs/8701967813/job/23865029772#step:5:60
The url in question:
https://onlinelibrary.wiley.com/doi/10.1111/tbed.13672
With
curl -I
I also getHTTP/2 403
but in a browser the url is valid. The url for citation ishttps://doi.org/10.1111/tbed.13672
which gives a 302 when requesting withcurl -I
, which would be picked up by urlcheck also.
This seems to be a known and open issue with urlchecker
(see here).
My conclusion is that we cannot really use urlcheck as is. We either should ignore redirects from the domain doi.org or ignore 403's. The former seems the cleanest solution, however, CRAN will not like our doi.org/... urls then.
@matteodelucchi what do you think about this?
I agree that ignoring redirects from doi.org is the cleaner option, but it has drawbacks for CRAN. Given that this issue is clearly on the side of urlchecker,
I'd go for ignoring 403s until this has been fixed in urlchecker.
We still get the printout of the urlchecker
results and should have a manual look at it. I'd add a note about this to the contribution file.
aaand we get again a 403 on a url that is digested just fine in a browser: https://github.com/furrer-lab/abn/actions/runs/8701967813/job/23865029772#step:5:60 The url in question: https://onlinelibrary.wiley.com/doi/10.1111/tbed.13672 With
curl -I
I also getHTTP/2 403
but in a browser the url is valid. The url for citation ishttps://doi.org/10.1111/tbed.13672
which gives a 302 when requesting withcurl -I
, which would be picked up by urlcheck also.This seems to be a known and open issue with
urlchecker
(see here).My conclusion is that we cannot really use urlcheck as is. We either should ignore redirects from the domain doi.org or ignore 403's. The former seems the cleanest solution, however, CRAN will not like our doi.org/... urls then. @matteodelucchi what do you think about this?
I agree that ignoring redirects from doi.org is the cleaner option, but it has drawbacks for CRAN. Given that this issue is clearly on the side of
urlchecker,
I'd go for ignoring 403s until this has been fixed inurlchecker.
We still get the printout of theurlchecker
results and should have a manual look at it. I'd add a note about this to the contribution file.
Decision:
https://doi.org
accept as no error -> no escalation.aaand we get again a 403 on a url that is digested just fine in a browser: https://github.com/furrer-lab/abn/actions/runs/8701967813/job/23865029772#step:5:60 The url in question: https://onlinelibrary.wiley.com/doi/10.1111/tbed.13672 With
curl -I
I also getHTTP/2 403
but in a browser the url is valid. The url for citation ishttps://doi.org/10.1111/tbed.13672
which gives a 302 when requesting withcurl -I
, which would be picked up by urlcheck also.This seems to be a known and open issue with
urlchecker
(see here).My conclusion is that we cannot really use urlcheck as is. We either should ignore redirects from the domain doi.org or ignore 403's. The former seems the cleanest solution, however, CRAN will not like our doi.org/... urls then. @matteodelucchi what do you think about this?
I agree that ignoring redirects from doi.org is the cleaner option, but it has drawbacks for CRAN. Given that this issue is clearly on the side of
urlchecker,
I'd go for ignoring 403s until this has been fixed inurlchecker.
We still get the printout of theurlchecker
results and should have a manual look at it. I'd add a note about this to the contribution file.Decision:
* Don't ignore 403. Filter those with error, if they start with `https://doi.org` accept as no error -> no escalation.
If we replace the 403-causing url with the doi.org/... url of the paper in question we also get 403 back. curl
tells us that we have a 302 (temp. redirect) which is fine for the urlchecker. the 403 then comes from the redirected url agian, I guess.
Basically, we get a 403 for a page that does exist. The reason could be in the header of the request, e.g. info in the user agent. But in any case this does not seem practical as for the url in question there seems currently no way to avoid the error.
To address this problem we could either add a specific exception (which seems a tedious approach), or allow the test to fail on some form.
If we allow the test to fail, we can either use continue-on-failure
(or similar) for the job. This will lead to a :x: in the workflow, but it will terminate nevertheless, or we simply do not raise an exception when running url checks.
I'll go for the latter version without raising an exception for now. The url check then becomes just a source of information, but for now this seems to be the most practical approach.
I just learned that DOIs are not URLs (see here)... I will change the DOIs in the documentation according to this template asap.
I just learned that DOIs are not URLs (see here)... I will change the DOIs in the documentation according to this template asap.
Makes sense to use the proper \doi format, however, in the end, if one requests content via https then the requested address is just a URL, the DOI can be part of the URI then, but the request will eventually lead to the https://doi.org/... and for our 403 problem, we will end up after a 302 from doi.org/... again on the url that gives 403 when using curl
(or similar).
Using \doi might fix our issue if urlchecker
does not (or differently) evaluate \doi{} entries.
I just learned that DOIs are not URLs (see here)... I will change the DOIs in the documentation according to this template asap.
Makes sense to use the proper \doi format, however, in the end, if one requests content via https then the requested address is just a URL, the DOI can be part of the URI then, but the request will eventually lead to the https://doi.org/... and for our 403 problem, we will end up after a 302 from doi.org/... again on the url that gives 403 when using
curl
(or similar).Using \doi might fix our issue if
urlchecker
does not (or differently) evaluate \doi{} entries. Hmm... I see...
Additionally, I just realised, that the URLs that cause the problem are in the README.md
where \url{}
doesn't work (it is a roxygen
syntax used only for function documentation).
Let's keep as is...
Closes #13