Closed jetnet closed 8 years ago
FYI: the "standard" tika lib can parse the mentioned document without any problem:
java -jar tika-app-1.11.jar 150730_2Q_PRESS_RELEASE_J_Final.pdf
The Tika lib does not have the error but it uses version 1.x of PDFBox to parse PDFs. Norconex Importer uses release candidate version 2.x of PDFBox. That newer release fixes several issues found in the 1.x version.
The problem you found has been fixed and you can try the latest importer snapshot, also found in the latest HTTP Collector snapshot.
Please test and confirm.
no more NPE! Thank you! Great support as usual! :smile:
Hi Pascal,
I've got a lot of PDFs, which cannot be imported, because of a NullPointerException in EnhancedPDFParser , e.g.:
could you please take a look at this? Thanks!
P.S. It would be helpful, if such "could not import" messages could be logged as "WARNINGS", and not as "DEBUG". Thanks!