Open johnwnowlin opened 2 years ago
The file causing the error came from a Konica copier and appears to be a TIFF parked in a PDF. I suspect this error is related to issues #145 and #142 , only because Tika needs to extract information from a TIFF. I do not see how to add the optional dependencies to the .Net build to see if that is the problem. Does anybody know how that is accomplished?
It would be really nice to get an example that crashes so we could try to correct this issue in future releases.
Tika is crashing on a PDF (which has confidential information, sorry can't post). at line 30 of StreamTextExtractor.cs attempting to extract text from the PDF.
Exception details: System.NullReferenceException HResult=0x80004003 Message=Object reference not set to an instance of an object. Source=TikaOnDotNet StackTrace: at org.apache.jempbox.impl.XMLUtil.getStringValue(Element node)
Oddly, even though this code is in a try/finally block it trows an exception. If it would let me catch the exception, we could just ignore this file and keep going.
I can open the file in adobe. Have saved as new pdf which also fails.
Is it possible to catch this error so the code can keep going?