ericvana / abot

Automatically exported from code.google.com/p/abot
Apache License 2.0
0 stars 0 forks source link

MaxPagesToCrawl is broken #119

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
Crawl http://www.springfieldclinic.com

-Changed maxPagesToCrawl="10" to 50
-Changed minCrawlDelayPerDomainMilliSeconds="1000" to 0

What is the expected output? What do you see instead?
Should crawl 50 pages, but now that the checks happen earlier its stopped right 
after all 50 links are scheduled.

Original issue reported on code.google.com by sjdir...@gmail.com on 27 Sep 2013 at 1:08

GoogleCodeExporter commented 9 years ago
fixed in git commit e1aca9d78c7a52a73d3f6b13625724d7326eb0f6.

Issue 119: Fixed bug with maxpagestocrawl since we now check ShouldCrawlPage 
before scheduling, not after scheduling. No need to stop the crawl when 
maxpages has been reached anymore since the check is done on the main thread.

Original comment by sjdir...@gmail.com on 21 Oct 2013 at 1:58

GoogleCodeExporter commented 9 years ago

Original comment by sjdir...@gmail.com on 21 Oct 2013 at 1:59