-
When I extract the text from a PDF, tika-python returns duplicated letters for some words, which i want to avoid.
In the java version of tika, The issue seems to be resolved by assigning "suppress…
-
Hi
While working with 1.7GB zip file i'm getting below error
[MainThread ] [WARNI] Tika server returned status: 500
currently i'm using tika-python version 1.2.4
in my environmental …
-
:eyes: Some source code analysis tools can help to find opportunities for improving software components.
:thought_balloon: I propose to [increase the usage of augmented assignment statements](https:/…
-
Hi, I get this error when parsing pdf using Tika
![error Tika Server](https://user-images.githubusercontent.com/23418370/167767008-3605f0f1-8b2e-498c-995f-39edf2a795de.png)
To overcome this issue,…
-
@chrismattmann Am trying to test language detection of Apache Tika using your wonderful library. Am not sure the right way to listing the language detectors.
https://tika.apache.org/1.25/api/org/a…
-
### Describe the bug
We're developing an application with docker and after building, I can interact with the docs and see that the endpoints behave as expected.. but when running unit tests that em…
-
Hi everyone,
I'm using the 1.24 version of Tika with Python installed through pip.
Tika is installed on a conda env with python 3.8.5 on Ubuntu 20.04.
I read that if I want to run tika one the loca…
-
Sorry for such a general issue. But I have been trying hard to extract Metadata (Author, Title, Abstract) from PDF using Tika-python client. But unfortunately, It is not able to extract any data under…
-
With the advent of [TIKA-3329](https://github.com/apache/tika/pull/419/files), we can now have a full translation engine in Tika-Python that supports over 300+ languages to English. Standardize on thi…
-
Hello,
in the java version of Tika, in particular in the tika GUI app, there is the possibility to print the raw text (after the conversion from PDF, docx etc..) in several formats, like "Formatted t…