UmbrellaDocs / linkspector

Uncover broken links in your content.
Apache License 2.0
61 stars 9 forks source link

[BUG] Reddit links result in 403 when run in Github Actions #76

Open yogan opened 2 months ago

yogan commented 2 months ago

Describe the bug Checking locally works find, but running in Github Actions leads to a 403.

To Reproduce Example README.md:

Redditor [u/Boojum](https://old.reddit.com/user/Boojum) has crafted a nice 
[surprise input](https://old.reddit.com/r/adventofcode/comments/18firip/2023_day_10_an_alternate_input_to_visualize/).

Relevant output from run in GitHub Action:

🚫 2023/day-10-python/README.md, https://old.reddit.com/user/Boojum , 403, 24, null
🚫 2023/day-10-python/README.md, https://old.reddit.com/r/adventofcode/comments/18firip/2023_day_10_an_alternate_input_to_visualize/ , 403, 25, null

I thought it might have to do with old.reddit.com, but this rewrite rule in .linkspector.yml did not help:

replacementPatterns:
  - pattern: "https?://old.reddit.com"
    replacement: 'https://www.reddit.com'

I'm not sure how I can get more details (e.g. headers) to see what is different locally and in GitHub Actions. I'll gladly provide more details if I can get some hints on how to get them.

gaurav-nelson commented 1 month ago

Since the response is 403 forbidden, I think Reddit is blocking IP addresses from GitHub. One workaround can be adding proxy setting options for Linkspector, but unless users use a paid proxy I am not sure how reliable the results will be.

yogan commented 1 month ago

Ah, I see. For me personally, that is not worth the effort. Feel free to close this issue.

I have disabled checking of Reddit links in GitHub Actions with a separate config file that has an ignore rule:

ignorePatterns:
  - pattern: '^https://(old\.|www\.)?reddit\.com'

and a run step with npx @umbrelladocs/linkspector check --config .linkspector-github.yml in my workflow yaml.

Locally I can still check everything every now and then. That's good enough.