Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

NoSuchMethodError snapshot 7-5-2015 #104

Closed OkkeKlein closed 9 years ago

OkkeKlein commented 9 years ago

Exception in thread "pool-1-thread-1" java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocumentInformation.getDictionary()Lorg/apache/pdfbox/cos/COSDictionary; at org.apache.tika.parser.pdf.EnhancedPDFParser.extractMetadata(EnhancedPDFParser.java:300) at org.apache.tika.parser.pdf.EnhancedPDFParser.parse(EnhancedPDFParser.java:162) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:117) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:117) at com.norconex.importer.parser.impl.AbstractTikaParser$MergeEmbeddedParser.parse(AbstractTikaParser.java:374) at com.norconex.importer.parser.impl.AbstractTikaParser.parseDocument(AbstractTikaParser.java:159) at com.norconex.importer.Importer.parseDocument(Importer.java:414) at com.norconex.importer.Importer.importDocument(Importer.java:314) at com.norconex.importer.Importer.doImportDocument(Importer.java:267) at com.norconex.importer.Importer.importDocument(Importer.java:195) at com.norconex.collector.core.pipeline.importer.ImportModuleStage.execute(ImportModuleStage.java:35) at com.norconex.collector.core.pipeline.importer.ImportModuleStage.execute(ImportModuleStage.java:26) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:90) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:213) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:473) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:373) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessURLsRunnable.run(AbstractCrawler.java:631) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

kalhomoud commented 9 years ago

Thanks for reporting this. I'm looking into it.

essiembre commented 9 years ago

The risks of dealing with snapshots :-) I fixed that compile error in the Importer project caused by code change in PDFBox latest snapshot and deployed new snapshot release of both Importer and HTTP Collector.

OkkeKlein commented 9 years ago

Fixed.