Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
When using the TitleGeneratorTagger it gives a NPE, probably because the field is empty or doesn't exist. Strings shouldn't be initialized as null, but as an empty string or there should be null checks.
java.lang.NullPointerException
at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
at java.util.regex.Matcher.reset(Matcher.java:309)
at java.util.regex.Matcher.<init>(Matcher.java:229)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at com.norconex.importer.handler.tagger.impl.TitleGeneratorTagger.getHeadingTitle(TitleGeneratorTagger.java
:286)
at com.norconex.importer.handler.tagger.impl.TitleGeneratorTagger.tagStringContent(TitleGeneratorTagger.jav
a:190)
at com.norconex.importer.handler.tagger.AbstractStringTagger.tagTextDocument(AbstractStringTagger.java:91)
at com.norconex.importer.handler.tagger.AbstractCharStreamTagger.tagApplicableDocument(AbstractCharStreamTa
gger.java:102)
at com.norconex.importer.handler.tagger.AbstractDocumentTagger.tagDocument(AbstractDocumentTagger.java:53)
at com.norconex.importer.Importer.tagDocument(Importer.java:514)
at com.norconex.importer.Importer.executeHandlers(Importer.java:345)
at com.norconex.importer.Importer.importDocument(Importer.java:316)
at com.norconex.importer.Importer.doImportDocument(Importer.java:266)
at com.norconex.importer.Importer.importDocument(Importer.java:190)
at com.norconex.collector.core.pipeline.importer.ImportModuleStage.execute(ImportModuleStage.java:37)
at com.norconex.collector.core.pipeline.importer.ImportModuleStage.execute(ImportModuleStage.java:26)
at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91)
at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360)
at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538)
at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419)
at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:8
12)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
When using the TitleGeneratorTagger it gives a NPE, probably because the field is empty or doesn't exist. Strings shouldn't be initialized as null, but as an empty string or there should be null checks.