Open diegorondini opened 2 years ago
I like this idea. Just don't know how exactly one would pass all the possible header fields to mlc? Via commandarg?
Probably the best option would be a config file, otherwise it would be impractical to specify different headers for different URLs.
See for example: https://github.com/orgs/github-community/discussions/14773#discussioncomment-2679987 https://github.com/tcort/markdown-link-check#config-file-format
I think your pipeline has been hit by this bug: https://github.com/becheran/mlc/actions/runs/3559864946/jobs/5979511630
[Err ] ./README.md (62, 22) => https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions - 403 - Forbidden
Error: https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions. 403 - Forbidden
@diegorondini fun fact: It does not fail when I run it locally. Does github somehow prevent requests to GitHub.com from their own runners? You mention missing request parameters? What would that be in this case?
@becheran I think the first question is why the pipeline checks that link even if there's no such link in the README.md
:
$ grep 'docs\.github' README.md
Returning to this bug, docs.github.com
requires the Accept-Encoding: zstd, br, gzip, deflate
header:
$ curl -i -X GET https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
HTTP/2 403
x-azure-ref: 0wn2EYwAAAACr4P2HgpUzTatC1/nj5XnyTU5aMjIxMDYwNjEzMDIxADU5NmQ3OGEyLWNhNWYtNDc5ZC1iY2RjLTA4MzU4MzMxNzRiMg==
accept-ranges: bytes
via: 1.1 varnish, 1.1 varnish
date: Mon, 28 Nov 2022 09:22:10 GMT
x-served-by: cache-iad-kiad7000135-IAD, cache-mrs10563-MRS
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1669627330.213655,VS0,VE92
strict-transport-security: max-age=31557600
$ curl -i -H "Accept-Encoding: zstd, br, gzip, deflate" -X GET https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
HTTP/2 200
cache-control: public, max-age=60
content-type: text/html; charset=utf-8
access-control-allow-origin: *
content-security-policy: default-src 'none';prefetch-src 'self';connect-src 'self';font-src 'self' data: githubdocs.azureedge.net;img-src 'self' github.com *.github.com *.githubusercontent.com *.githubassets.com data: githubdocs.azureedge.net placehold.it;object-src 'self';script-src 'self' data: githubdocs.azureedge.net;frame-src 'self' github.com *.github.com *.githubusercontent.com *.githubassets.com https://www.youtube-nocookie.com;frame-ancestors 'self' github.com *.github.com *.githubusercontent.com *.githubassets.com;style-src 'self' 'unsafe-inline' data: githubdocs.azureedge.net;child-src 'self';upgrade-insecure-requests;base-uri 'self';form-action 'self';script-src-attr 'none'
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: same-origin
x-dns-prefetch-control: off
x-frame-options: SAMEORIGIN
x-download-options: noopen
x-content-type-options: nosniff
origin-agent-cluster: ?1
x-permitted-cross-domain-policies: none
referrer-policy: strict-origin-when-cross-origin
x-xss-protection: 0
x-powered-by: Next.js
x-azure-ref: 0hXyEYwAAAADMF8jkAx/XToTRxIg5u1m/UEhMMzBFREdFMDMxOQA1OTZkNzhhMi1jYTVmLTQ3OWQtYmNkYy0wODM1ODMzMTc0YjI=
content-encoding: br
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
date: Mon, 28 Nov 2022 09:22:29 GMT
age: 335
x-served-by: cache-iad-kiad7000135-IAD, cache-mrs10583-MRS
x-cache: CONFIG_NOCACHE, HIT, HIT
x-cache-hits: 3, 1
x-timer: S1669627349.305248,VS0,VE1
vary: Accept-Encoding
strict-transport-security: max-age=31557600
content-length: 38324
Warning: Binary output can mess up your terminal. Use "--output -" to tell
Warning: curl to output it to your terminal anyway, or consider "--output
Warning: <FILE>" to save to a file.
Sorry, I just realized I should have checked out the github-action-output
branch.
Now it fails for me as well with 0.15.4:
$ mlc ./README.md
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ +
+ markup link checker - mlc v0.15.4 +
+ +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
09:31:29 [WARN] Broken reference link: Borrowed("possible values: md, html")
09:31:29 [WARN] Strip everything after #. The chapter part '#ci-pipeline-integration' is not checked.
[ OK ] ./README.md (19, 8) => #ci-pipeline-integration -
[ OK ] ./README.md (64, 1) => ./docs/FailingAnnotation.PNG -
[ OK ] ./README.md (32, 28) => https://doc.rust-lang.org/cargo/ -
[ OK ] ./README.md (4, 2) => https://badgen.net/crates/d/mlc?color=blue -
[ OK ] ./README.md (46, 56) => https://github.com/marketplace/actions/markup-link-checker-mlc -
[ OK ] ./README.md (20, 29) => https://rust-lang.github.io/async-book/ -
[ OK ] ./README.md (3, 2) => https://img.shields.io/crates/v/mlc.svg?color=orange -
[ OK ] ./README.md (9, 1) => https://asciinema.org/a/299100 -
[ OK ] ./README.md (9, 2) => https://asciinema.org/a/299100.svg -
[ OK ] ./README.md (6, 2) => https://img.shields.io/badge/License-MIT-yellow.svg -
[ OK ] ./README.md (5, 2) => https://github.com/becheran/mlc/actions/workflows/rust.yml/badge.svg -
[ OK ] ./README.md (7, 2) => https://img.shields.io/badge/PRs-welcome-brightgreen.svg -
[ OK ] ./README.md (3, 1) => https://crates.io/crates/mlc -
[ OK ] ./README.md (4, 1) => https://crates.io/crates/mlc -
[ OK ] ./README.md (32, 92) => https://crates.io/crates/mlc -
[Err ] ./README.md (62, 22) => https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions - 403 - Forbidden
[ OK ] ./README.md (144, 60) => https://github.com/becheran/mlc/blob/master/LICENSE -
[ OK ] ./README.md (75, 32) => https://github.com/becheran/ntest/blob/master/.github/workflows/ci.yml -
[ OK ] ./README.md (79, 37) => https://hub.docker.com/repository/docker/becheran/mlc -
[ OK ] ./README.md (140, 14) => https://github.com/becheran/mlc/blob/master/CHANGELOG.md -
[ OK ] ./README.md (6, 1) => https://opensource.org/licenses/MIT -
[ OK ] ./README.md (112, 221) => https://github.com/becheran/wildmatch -
[ OK ] ./README.md (40, 54) => https://github.com/becheran/mlc/releases -
[ OK ] ./README.md (5, 1) => https://github.com/becheran/mlc/actions/workflows/rust.yml -
[ OK ] ./README.md (7, 1) => https://github.com/becheran/mlc/blob/master/CONTRIBUTING.md -
Result (25 links):
OK 24
Skipped 0
Warnings 0
Errors 1
The following links could not be resolved:
./README.md (62, 22) => https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions.
Ah, right. Did the same mistake and ran it on wrong branch locally 🤦♂️
@diegorondini would 'Accept-Encoding: *' help in this case? Might be a sane default since we don't care about the content anyways right now.
To make it configurable I think a map of links with wildcards and associated headers would make sense as config parameter. Will think about it.
@becheran well, not literally:
$ curl -i -H "Accept-Encoding: *" -X GET https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions
HTTP/2 403
[...]
The official way to mean any encoding should be Accept-Encoding: */*
, but I don't know how much it works in pratice.
https://stackoverflow.com/questions/25182888/does-in-an-http-accepts-encoding-header-mean-gzip-is-supported
The library you're using (reqwest?) may support accepting all encodings. Libcurl does that: https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
Not sure though if servers that don't support compression / encoding peacefully decline the "Accept-Encoding" header.
Yes, I am using reqwest. I did turn on all supported encodings (brotli, gzip, deflate) and that did the trick for now. But I guess there are other cases where a custom request is still required. For example if a authentication token is required for a specific link.
Is your feature request related to a problem? Please describe. Some URLs require specific HTTP request parameters. One example is the github docs pages, for example this
.md
will fail:The reason is that the page requires specific HTTP headers: https://github.com/github-community/community/discussions/14773
Describe the solution you'd like It would be nice to have a way to specify HTTP request parameters, possibly per-URL.