Norconex / collector-core

Collector-related code shared between different collector implementations
http://www.norconex.com/collectors/collector-core/
Apache License 2.0
7 stars 15 forks source link

Fatal error running multiple crawlers #20

Open danizen opened 6 years ago

danizen commented 6 years ago

VERSIONS:

Beyond the excerpt below, rest is dependency inheritance:

    <!-- upstream versions -->
    <norconex.http.version>2.8.0</norconex.http.version>
    <norconex.core.version>1.9.0</norconex.core.version>
    <norconex.importer.version>2.8.0</norconex.importer.version>
    <norconex.es.version>4.1.0</norconex.es.version>
    <norconex.committer.version>2.1.2</norconex.committer.version>

PROBLEM DESCRIPTION:

NullPointerException occurs running with multple crawlers:

monitor_general_crawler: 2018-02-20 14:49:19 FATAL - monitor_general_crawler: An error occured that could compromise the stability of the crawler. Stopping excution to avoid furt
her issues...
java.lang.NullPointerException
        at com.norconex.jef4.job.group.AbstractJobGroup.groupProgressed(AbstractJobGroup.java:87)
        at com.norconex.jef4.suite.JobSuite$2.statusUpdated(JobSuite.java:372)
        at com.norconex.jef4.status.JobStatusUpdater.setProgress(JobStatusUpdater.java:50)
        at com.norconex.collector.core.crawler.AbstractCrawler.setProgress(AbstractCrawler.java:474)
        at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:420)
        at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:812)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
monitor_general_crawler: 2018-02-20 14:49:19 INFO -          CRAWLER_STOPPING

ANALYSIS:

Root job allocated by AbstractCollector in createJobSuite isn't fully wired, because although AsyncJobGroup is not abstract, it doesn't allocate a groupUpdater, and so the NullPointerException.

DESIRED BEHAVIOR:

No NullPointerException.

danizen commented 6 years ago

Somehow, that crawler's lastActivity keeps udating and it doesn't stop, because the other crawler is still running. I don't know that it is making any progress however, as this "monitor_general_crawler" is lingering way behind in progress past the "monitor_lessdepth_crawler". I thought it was just because of the depth for a long time.