Open jafooool opened 1 month ago
Which datashare version?
I tried with the 18.3.0 (latest: from yesterday) and it works fine with PDF (with and without OCR).
I've already seen this kind of error when there is a low level issue with file access or with badly encoded PDF files.
Are you sure that your PDF files are not corrupted? Or that the access to the filesystem is OK?
Describe the bug added a trove of PDF file ... launch indexing ... get only
Error writing: org.apache.tika.sax.TaggedSAXException: Error writing: org.xml.sax.SAXException: Error writing: java.io.IOException: Read end dead 2024-10-08 10:35:10,354 [Apache Tika: XXXXX.pdf] WARN PDFStreamEngine - org.apache.tika.sax.TaggedSAXException: Error writing: org.apache.tika.sax.TaggedSAXException: Error writing: org.xml.sax.SAXException: Error writing: java.io.IOException: Read end dead
etc. No PDF files get indexed
Desktop (please complete the following information):
last available version of DATASHARE