When a page contains a URL with some leading white space, e.g. <a href=" http://abc.com">, the code assumes that it is a relative link and prepends the URL of the page of the site being scanned, so the URL gets queued as https://mysite.com/ http://abc.com. I fixed this by adding:
// trim white space
$linkedUrl = trim($linkedUrl);
at the start of the private function absolutizeUrl($linkedUrl, $currentPageUrl)
This seems to fix the problem, but I can't claim to be really familiar with the code, so I'd welcome any review before being incorporated.
When a page contains a URL with some leading white space, e.g.
<a href=" http://abc.com">
, the code assumes that it is a relative link and prepends the URL of the page of the site being scanned, so the URL gets queued ashttps://mysite.com/ http://abc.com
. I fixed this by adding:at the start of the
private function absolutizeUrl($linkedUrl, $currentPageUrl)
This seems to fix the problem, but I can't claim to be really familiar with the code, so I'd welcome any review before being incorporated.