eddiejaoude / http-archive-crawler

Powered by 'HTTP Archive' & 'Web Page Test'
3 stars 0 forks source link

Crawl start path needs to accept hostnames with port numbers #13

Open TheOpsMgr opened 11 years ago

TheOpsMgr commented 11 years ago

Steps to Repro:

Use any URL with a port number e.g. http://selfridges.cloudopsguys.com:81/

Result:

Error message "The input appears to be a DNS hostname but cannot match TLD against known list, The input does not appear to be a valid local network name"

Expected Result:

Accept hostname/port number combinations as valid and spider away!

eddiejaoude commented 11 years ago

I have logged a ticket with the author of the spider https://github.com/matthijsvandenbos/php-spider/issues/7 to let me know if I am missing some configuration (documentation is a bit weak)

eddiejaoude commented 11 years ago

Code added to allow user to enter port number (however this is currently disabled until supported by library spider)