Closed GoogleCodeExporter closed 9 years ago
Hi,
Are you running the Abot.Demo application? If not please send me your
config file. If you are this is most likely your problem...
The demo project has a few config values set that greatly limit Abot's
speed. This is to make sure you don't get banned by your isp provider or get
blocked by the sites you are crawling. These setting are..
<abot>
<politeness
...(excluded)
minCrawlDelayPerDomainMilliSeconds="1000"
...(excluded)
/>
</abot>
Change it to...
<abot>
<politeness
...(excluded)
minCrawlDelayPerDomainMilliSeconds="0"
...(excluded)
/>
</abot>
This tells abot to not wait in between crawl requests.
Original comment by sjdir...@gmail.com
on 18 Dec 2013 at 1:44
Here is what I have.
CrawlConfiguration crawlConfig = new CrawlConfiguration();
crawlConfig.CrawlTimeoutSeconds = 100;
crawlConfig.MaxConcurrentThreads = 10;
crawlConfig.MaxPagesToCrawl = 1000;
crawlConfig.UserAgentString = "Test";
crawlConfig.MinCrawlDelayPerDomainMilliSeconds = 0;
Original comment by P...@stephendownward.ca
on 18 Dec 2013 at 9:45
On v1.1.1 i updated the Abot.Demo.Program.cs file's GetDefaultWebCrawler() to
match what you have above, however, I don't see any slowness. its crawling
50-100 pages per sec. See attached log file.
private static IWebCrawler GetDefaultWebCrawler()
{
CrawlConfiguration crawlConfig = new CrawlConfiguration();
crawlConfig.CrawlTimeoutSeconds = 100;
crawlConfig.MaxConcurrentThreads = 10;
crawlConfig.MaxPagesToCrawl = 1000;
crawlConfig.UserAgentString = "Test";
crawlConfig.MinCrawlDelayPerDomainMilliSeconds = 0;
return new PoliteWebCrawler(crawlConfig, null ,null ,null ,null ,null ,null ,null ,null);
}
Can do a fresh checkout of v1.1.1 and override the Abot.Demo.Program.cs file
with the one attached and then give it a run?
Original comment by sjdir...@gmail.com
on 18 Dec 2013 at 9:27
Attachments:
Okay, the problem was I was crawling a really slow website, however, with I
crawl apple.com, it starts out fast but it slows down a lot by the 800th page.
Original comment by P...@stephendownward.ca
on 19 Dec 2013 at 10:37
Original comment by sjdir...@gmail.com
on 30 Dec 2013 at 3:13
I've encountered something similar where the crawl starts off quickly but then
slows down to about 10x slower at around 1K pages or so. I'm troubleshooting
this now but saw this article and it sounded similar. I was crawling
www.seriouseats.com and ipython.org when I encountered this. Does anyone have
any additional info on why this may be happening? I'm not sure at this point if
it is the crawler itself or some rate limited that is being initiated by the
target.
Original comment by b...@luceomedia.com
on 31 Jul 2014 at 3:05
Hi,
Its very likely that it is the site throttling or being overwhelmed.
A few things to try:
1: Run fiddler and monitor the time it takes for that site to individual
requests.
2: Open a browser on the same machine while it is running slow and request some
of the urls that are taking a long time. If the browser is taking forever to
pull up the page then its the server, not abot.
Hope that helps...
Steven
Original comment by sjdir...@gmail.com
on 31 Jul 2014 at 4:39
Original issue reported on code.google.com by
P...@stephendownward.ca
on 18 Dec 2013 at 12:31