Raku / doc

🦋 Raku documentation
https://docs.raku.org/
Artistic License 2.0
291 stars 293 forks source link

Broken remote links #4500

Closed coke closed 4 months ago

coke commented 4 months ago

Added a http link checker to the xt link tester, which found the following issues:

(part of #4476)

    not ok 20 - HTTP 404 https://colabti.org/irclogger/irclogger_log_search/raku
    not ok 2 - HTTP 404 https://github.com/rakudo/rakudo/blob/master/src/core.c/CompUnit/RepositoryRegistry.rakumod
    not ok 5 - HTTP 404 https://raku.land/github:raku-community-modules/Test::Output
    not ok 6 - HTTP 403 https://linux.die.net/man/3/clock_gettime
    not ok 29 - HTTP Unknown error: https://dev.mysql.com/downloads/repo/apt/
    not ok 7 - HTTP Unknown error: http://www.MegaGigaTeraPetaCorp.com/std/disclaimer.txt
    not ok 59 - HTTP Unknown error: https://en.wikipedia.org/wiki/Hyperbolic_function
    not ok 25 - HTTP Unknown error: http://perl6maven.com/perl6-is-a-value-in-a-given-list-of-values
Util commented 4 months ago

On Jul 16, 2024, at 16:23, Will Coleda @.***> wrote:

Added a http link checker to the xt link tester, which found the following issues:

(part of #4476 https://github.com/Raku/doc/issues/4476)

not ok 20 - HTTP 404 https://colabti.org/irclogger/irclogger_log_search/raku
not ok 2 - HTTP 404 https://github.com/rakudo/rakudo/blob/master/src/core.c/CompUnit/RepositoryRegistry.rakumod
not ok 5 - HTTP 404 https://raku.land/github:raku-community-modules/Test::Output
not ok 6 - HTTP 403 https://linux.die.net/man/3/clock_gettime
not ok 29 - HTTP 403 https://dev.mysql.com/downloads/repo/apt/
not ok 7 - HTTP Unknown error: http://www.MegaGigaTeraPetaCorp.com/std/disclaimer.txt
not ok 67 - HTTP Unknown error: https://en.wikipedia.org/wiki/Cis_%28mathematics%29
not ok 2 - HTTP Unknown error: https://en.wikipedia.org/wiki/Julian_day
not ok 25 - HTTP Unknown error: http://perl6maven.com/perl6-is-a-value-in-a-given-list-of-values

— Reply to this email directly, view it on GitHub https://github.com/Raku/doc/issues/4500, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAANNULKNRN6L5672FYJM7LZMWFL3AVCNFSM6AAAAABK7L7VEGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQYTEMBWGY2DGNI. You are receiving this because you are subscribed to this thread.

The two Wikipedia URLs are exactly correct, as are the mysql die.net URLs. Possible connection hiccup during retrieval?

Whole website down: http://perl6maven.com/ Temporary? If not: Old: http://perl6maven.com/perl6-is-a-value-in-a-given-list-of-values New: https://web.archive.org/web/20190608135817/http://perl6maven.com/perl6-is-a-value-in-a-given-list-of-values

Old: https://raku.land/github:raku-community-modules/Test::Output New: https://raku.land/zef:raku-community-modules/Test::Output Had not been updated for change to Zef.

Old: https://github.com/rakudo/rakudo/blob/master/src/core.c/CompUnit/RepositoryRegistry.rakumod New: https://github.com/rakudo/rakudo/blob/main/src/core.c/CompUnit/RepositoryRegistry.rakumod Had not been updated after master branch was renamed to main.

coke commented 4 months ago

Thank you, especially for the web.archive link.

coke commented 4 months ago
Util commented 4 months ago

On Jul 16, 2024, at 17:53, Will Coleda @.***> wrote:

[1] https://linux.die.net/man/3/clock_gettime

--snip--

Based on https://gitlab.com/gitlab-org/gitlab-docs/-/merge_requests/4646 saying "linux.die.net - everything pointing to this site always returns 403 Forbidden", and my own experiments with curl leading me to believe that die.net is using some JS voodoo, I recommend changing that link to an equivalent on another host.

This one looks best to me: https://www.man7.org/linux/man-pages/man3/clock_gettime.3.html

coke commented 4 months ago

This one looks best to me:

Done, thanks.

coke commented 4 months ago

skip-listed the mysql URL.

Now getting only this occasionally:

    not ok 67 - HTTP (Stream reset by the server): https://en.wikipedia.org/wiki/Cis_%28mathematics%29

Typically for wikipedia URLS, but not always the same one. Always runs clean if I then immediately test the one file it was in. Will add a retry mechanism for this particular error.

coke commented 4 months ago

With d8a2bbd6a, the test now runs clean (as far as remote link checks goes). Still some missing internal links, but there's a separate ticket for that.

Util commented 4 months ago

On Jul 17, 2024, at 14:44, Will Coleda @.***> wrote:

skip-listed the mysql URL.

Now getting only this occasionally:

not ok 67 - HTTP (Stream reset by the server): https://en.wikipedia.org/wiki/Cis_%28mathematics%29

Typically for wikipedia URLS, but not always the same one. Always runs clean if I then immediately test the one file it was in. Will add a retry mechanism for this particular error.

I ran curl -v ... on the Cis_%28mathematics%29 URL, 20 times total (10 today, 10 yesterday), and examined the -v details, and see no problem. I have not analysed with WireShark, but I have a hunch the problem is Cro's use of persistent connections.

Reading the source of rakudoc-l.rakutest, I notice that $ua is populated with a new Cro::HTTP::Client instance only once. If that line was moved to just before the .get line, then a new instance would be used on each call. This would be microscopically less efficient, but would stop the effect of the persistent connections.

Alternately, the docs imply that changing line 25 to my $ua = Cro::HTTP::Client.new( :!persistent ); would have the same disabling of persistent connections, with less other impact. I have not played with Cro enough to know the effectiveness of this approach.

coke commented 4 months ago

The reuse of the user agent was intentional, old habit.

I've already got the error handling code to retry if there's a 429, a GoAway, or just a stream reset, so I think we're good.