-
apache/tika:latest
apache/tika:latest-full
-
Hello! While trying to extract content from a PDF, I got the following error with very little information:
- ParseError("Parse error occurred: Unable to extract PDF content").
After modifying th…
-
Hi,
I would like to see the Paperless-ngx add having additional two containers running, Tika and Gotenberg ([docs](https://docs.paperless-ngx.com/configuration/#optional-services)).
I'm willing…
-
If I select a standalone Tika server I don't see where to enter the config details on the settings.
Should that be an option?
-
**Bug description**
Tika Document Reader dependency causing response type exception.
**Environment**
Spring Boot version: 3.3.4
Spring AI Version: 1.0.0-M2
Java Version: OpenJDK 22
**Steps t…
-
Hi, I’d like to use Extractous for my document processing tasks. I often need to extract PDF content as XML to retain structural information, such as page boundaries. This is a feature supported by Ap…
-
I mistakenly posted an issue on Collector about this problem; turns out that Collector is pulling in Importer as a transitive dependency which in turn pulls Tika 1.27;
My application relies on Tik…
-
**Describe the bug**
added a trove of PDF file ... launch indexing ... get only
Error writing:
org.apache.tika.sax.TaggedSAXException: Error writing:
org.xml.sax.SAXException: Error writing:
ja…
-
[Docling](https://github.com/DS4SD/docling) looks like a promising text extraction library that could possibly augment or replace Apache Tika.
**Update**: Docling added 3.9 support, this is a go!
…
-
Hello and Thank you for your work,
how can i use Tika and Gotenberg.
I activate them in the config. Or is that not active? I have at least activated this in the config file. But I am not sure. b…