dantleech / fink

PHP Link Checker
MIT License
205 stars 26 forks source link

If a page contains a Location: redirect, the link checker uses the original URL as a base #102

Closed mcdurdin closed 4 years ago

mcdurdin commented 4 years ago

If you manually navigate to https://keyman.com/desktop, it forwards with the Location header to /desktop/.

Now, if you include a link to that page, e.g. <a href='http://keyman.com/desktop'>, the link checker appears to still use the original URL as a base, meaning that relative URLs on the retrieved page have the wrong base (/ instead of /desktop/).

To test:

./vendor/bin/fink https://keyman.com --max-external-distance=0 --max-distance=2 -oreport.json

You'll note errors such as:

{"distance":2,"exception":null,"referrer":"https:\/\/keyman.com\/desktop","referrer_title":"","referrer_xpath":"\/html\/body\/div[2]\/div[5]\/div[1]\/div\/div\/div\/a","request_time":218392,"status":404,"url":"https:\/\/keyman.com\/download.php","timestamp":"2020-06-17T06:10:23+00:00"}

But if you visit the page in your browser, you can see that the download link is to the relative location download.php and works fine.

dantleech commented 4 years ago

hm, if I run the above I do not get this issue, only these:

{
  "distance": 1,
  "exception": null,
  "referrer": "https://keyman.com/",
  "referrer_title": "Keyman for iPhone",
  "referrer_xpath": "/html/body/div[1]/div/div[2]/ul[1]/li[5]/a",
  "request_time": 507407,
  "status": 404,
  "http_version": "2",
  "url": "https://keyman.com/iphone/",
  "timestamp": "2020-06-17T09:54:17+01:00"
}
{
  "distance": 1,
  "exception": null,
  "referrer": "https://keyman.com/",
  "referrer_title": "Keyman for iPad",
  "referrer_xpath": "/html/body/div[1]/div/div[2]/ul[1]/li[6]/a",
  "request_time": 739774,
  "status": 404,
  "http_version": "2",
  "url": "https://keyman.com/ipad/",
  "timestamp": "2020-06-17T09:54:17+01:00"
}
{
  "distance": 2,
  "exception": null,
  "referrer": "https://keyman.com/testimonials/",
  "referrer_title": "First Peoples' Cultural Foundation",
  "referrer_xpath": "/html/body/div[2]/div[5]/div/div/h4[21]/a",
  "request_time": 227580,
  "status": 404,
  "http_version": "2",
  "url": "https://keyman.com/testimonials/www.fpcf.ca",
  "timestamp": "2020-06-17T09:54:24+01:00"
}
dantleech commented 4 years ago

Can you try with dev-master? I think we may be overdue for a release :see_no_evil:

dantleech commented 4 years ago

I tagged 0.10.0 which contains the fix for the issue I think. Please re-open if this is not the case, thanks.

mcdurdin commented 4 years ago

I tagged 0.10.0 which contains the fix for the issue I think.

Yes, this appears to resolve the issue, thanks!