Closed bytecauldron closed 6 months ago
Yes! What a nightmare! Why did I think this was a good idea?
There are many GitHub actions that do this, but due to newer releases of Node, and GitHub's policy to discontinue support for previous versions each year, it guarantees that they will stop working at some point. Maybe there is a way to run the action with the latest version of Node and the respective actions I'm using. 15 minute simple task vs. 4 hour automation, etc, etc.
I've had the most success with this. I just need to make sure I'm providing the appropriate permissions. https://github.com/lycheeverse/lychee-action
There are some caveats. First, it can submit duplicate issues. It doesn't check if the broken link is already listed as a GitHub issue. I'm thinking it would be overkill to run this every month, so probably annually/biannually. Second, I was getting several "too many request" errors from itch.io. Maybe there is a way to limit the amount of requests. Other links appear to work fine. (Most of them are on GitHub, so I would hope they work.)
Ok, much better. The errors I got were for legitimate 404's, and it looks like itch isn't rate limiting the requests anymore.
By default, lychee has a --max-concurrency
of 128, so setting that to 1 definitely fixed the issue with itch. There's probably a sweet spot between 1-128 where the too many requests error gets triggered. Considering this will only run every six months or so, I don't think having a low concurrency will be that big of a deal. Takes less than a minute without it.
Ugh, still running into network errors. In my test environment, keeping concurrency to 8/16 seemed to fix it, but now I'm still getting the same error when concurrency is set to 1. Whether or not the action trips 429 is like reading tea leaves.
I did end up emailing Itch to see if they can help. I figure it's just happening because a majority of the links are either held on GitHub or Itch. GitHub apparently has an aggressive rate limiter, but if I'm not getting any 429 errors from GitHub, just how aggressive is Itch then? 😅
If all else fails, there are some options here (if the action supports it).
https://lychee.cli.rs/troubleshooting/rate-limits/
Mainly, I could limit the max-retries
, accept 429 network errors, or exclude Itch's domain entirely. Again, all these options would defeat the purpose of having the action in the first place.
Ok, time to put a pin on this:
This should help a little bit with pruning links on GitHub and other sites, at the very least.
Right now, going through the entire list to see which links are broken by hand is not that efficient. GitHub does have something for this. I'll do some snooping and see if this is worth the effort. 404's does not necessarily imply a repo is gone forever. It could be moved or temporarily unavailable. So, just an idea of how this might work: