Closed xhoong closed 7 years ago
Do you have a URL that can be used to reproduce the problem? Or maybe can you attach an HTML causing the problem?
Sure, here's the landing page:
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0" />
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<base href="" />
A new snapshot release was made with the fix. Please try and confirm.
Thanks for the fast turn around, I tested it and it's fix.
Hi, I'd been using Norconex and I found it to be a very versatile crawler. I try to crawl a new site but I got NPE, and I found out the page have a \<base/> tag that has href="" (empty string). I think this condition needs to handle and possible use referer.documentBase if the \<base/> tag is empty?
I'm using 2.6.2 collector and 2.6.1 importer. I can create a pull request if you are for the above approach or suggest a fix. Thanks.
java.lang.NullPointerException at com.norconex.collector.http.url.impl.GenericLinkExtractor$Referer.(GenericLinkExtractor.java:790)
at com.norconex.collector.http.url.impl.GenericLinkExtractor.adjustReferer(GenericLinkExtractor.java:317)
at com.norconex.collector.http.url.impl.GenericLinkExtractor.extractLinks(GenericLinkExtractor.java:301)
at com.norconex.collector.http.pipeline.importer.LinkExtractorStage.executeStage(LinkExtractorStage.java:73)
at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31)
at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24)
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91)
at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:335)
at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:515)
at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:401)
at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:783)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)