What steps will reproduce the problem?
1. A crawler thread is fired off at the beginning with a few seeds which are
crawled without any issues.
2. Now the crawler thread is waiting for new urls to be added to the
'Frontier', and the monitor thread waits for 30 seconds before shutting
everything down.
3. After say 10 seconds I add a new seed to the frontier by calling the
'addSeed(String pageUrl, int docId)' function and providing the url and my own
random doc Id. The WebURL is successfully added to the frontier by calling the
function 'frontier.schedule(webUrl);'
4. Now the problem comes in. Inside the 'schedule(WebURL url)' function if you
look at the source code, the WebURL is added to the workQueues however the
WebCrawler waits for new urls and is waiting for the monitor object to be
notified ('getNextURLs' method). The 'schedule(WebURL url)' function is missing
a notify call and hence the crawler just simply waits and eventually downs
after the 30 second period is over.
What is the expected output? What do you see instead?
-
What version of the product are you using?
crawler4j-3.5
Please provide any additional information below.
-
Original issue reported on code.google.com by bassim.b...@googlemail.com on 18 Sep 2014 at 8:51
Original issue reported on code.google.com by
bassim.b...@googlemail.com
on 18 Sep 2014 at 8:51