Closed ebbesson closed 6 years ago
I've currently mitigated this by doing a "StringUtils.isBlank()" check on contentType in com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.isHTMLByContentType and this seems to work. I suspect that the error is further up in the code though.
Thanks for reporting and finding the cause! The fix will be in the next snapshot release. I will let you know when it is available.
This is now fixed in latest snapshot release. Please confirm.
Closing for lack of feedback. Please re-open if you witness any issues with the fix.
I'm experiencing issues while using the PhantomJSFetcher. Every odd run or so PhantomJS exits with value 137 and this seem to cause an NPE when trying to check for content-type.
ERROR SystemCommand:304 - Command returned with exit value 137 (command properly escaped?). Command: ./phantomjs-2.1.1-linux-x86_64/bin/phantomjs --ssl-protocol=any --ignore-ssl-errors=true --web-security=false --cookies-file=/tmp/cookies.txt --load-images=false /app/tron/crawler/scripts/phantom.js http://example.com/path/more/page/ /tmp/1507889738145000000 1000 -1 http sepu 1.0 Error: "" INFO REJECTED_ERROR:67 - REJECTED_ERROR: http://example.com/path/more/page/ ERROR AbstractCrawler:549 - SHB crawler attachment: Could not process document: http://example.com/path/more/page/ (null) java.lang.NullPointerException at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.isHTMLByContentType(PhantomJSDocumentFetcher.java:640) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:507) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:358) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:521) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:407) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:789) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
I'm running the following versions