JustinBeckwith / linkinator

🐿 Scurry around your site and find all those broken links.
MIT License
1.04k stars 80 forks source link

Show redirected links #37

Open JustinBeckwith opened 5 years ago

JustinBeckwith commented 5 years ago

Today gaxios quietly follows redirects. Folks may want to see this as a warning, so we should show that info and give an option to make that show up as an error.

JustinBeckwith commented 4 years ago

Starting to take a look here. I can think of a few ways to go about this one.

1. add a --no-redirects flag

This flag, if set to true, would treat any link that returns a redirection status code (301, 307, etc) as a broken link. If a redirection was found, it would cause the results to be failed. A new LinkStatus of REDIRECTED would be added.

If the goal is to reduce the number of redirects you use (something we're actively trying to do at cloud.google.com), this could actually be pretty valuable, if not somewhat annoying from time to time.

2. expose a --on-redirect=( FOLLOW | FAIL | WARN ) flag

This is a more configurable version of above. Instead of turning it into an instant fail, folks may just be informed that following the link did require a redirect along the way. We could us WARN for this state, and include something in the CLI output that indicates it was followed. Alternatively, you could still follow (default) or flat out fail (above).

I'd love some thoughts on the desired approach here. Adding @zeke @marapper @XhmikosR because I think you've all expressed some interest in this one. Adding @bcoe because I wonder if / how we would use this one for googleapis.

XhmikosR commented 4 years ago

I'd be happy with either implementation :)

The second one is clearly more flexible for end users offering the highest customization potential. Not sure how much code it will require, though.

EDIT:

BTW the second solution could be expanded even further to allow only specific domains to. Say one wants to use a shortened URL in some cases, but warn or fail for the others.

zeke commented 4 years ago

I'd be in favor of option 2 or some variant thereof. Specifically:

  1. an option to follow redirects or not
  2. a property in the emitted link event object indicating the redirect URL(s)

The app I'm working on has some URLs that redirect multiple times before reaching the target page. For example, help.github.com/enterprise redirects to help.github.com/en/enterprise, which in turn redirects to help.github.com/en/enterprise/2.19 Maybe this is a niche scenario, but perhaps worth considering here.

marapper commented 4 years ago

In most cases, there is no difference between success and success after redirect I think.

Only when we try to optimize site we actually need some warnings about obsolete redirects. So new REDIRECTED status that not lead to fail is a good decision.

But more important in such cases is a number of redirects in a chain (http://site -> https://site -> https://www.site -> https://www.site/en) nor the fact of redirect. I suppose it could work with something like maxRedirects. When you want to fail you can set it to zero. And if you don't - you can still limit it to 2 or 3, or infinity.

tbeseda commented 3 years ago

👋 I realize this issue is a bit old, but I wanted to chime in with a use case for failing or even detecting 301/2 redirects with linkinator.

We use linkinator to find busted links across a large-ish docs site. There's an internal redirect map/middleware for old known paths forwarding to new paths. We often used these deprecated paths internally in the docs. Ideally those internal links would be discoverable so they can be updated to the newer path and not require a hit to the redirect middleware.

bobbyg603 commented 2 years ago

It seems that when Linkinator follows 301 redirects and finds relative paths, it assumes that the path is relative to the original URL, however in the cases where the domain has changed it seems to be detecting 404s for URLs that are otherwise valid.