Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

[PhantomJSDocumentFetcher] (2.8.0) NPE in createPhantomJSCommand() #490

Closed sylvainroussy closed 6 years ago

sylvainroussy commented 6 years ago

Hi, It seems there's a regression with HttpCollector 2.8.0 (not reproduced in 2.7.1). An Ajax crawl throws Null Pointer Exception :

java.lang.NullPointerException
    at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.createPhantomJSCommand(PhantomJSDocumentFetcher.java:1030)
    at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:799)
    at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773)
    at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42)

on the line:

   cmdArgs.add(argQuote(                              // phantom.js arg 7
                screenshotDimensions.getWidth() + "x"
              + screenshotDimensions.getHeight()));       
sylvainroussy commented 6 years ago

Closing because 2.8.1 fix it:

Fixed NullPointerException when using PhantomJSDocumentFetcher without          specifying any "screenshotDimensions".