-
I am facing problem while extracting content from pdf, the returned content is None in case of pdf images. The same code seems to be working on my local setup whereas failing on aws lambda.
I have…
-
Received the following error message on my first time running Tika (OS: Windows 10, Set up as "Airgap" with no reliance on internet, using tika-server version 1.24.1)
**URLError: **
I have found i…
-
I created a function that parses a PDF file using TIKA in a service and when I tried to dockerize it, it displays this error :
parse_pdf(tmp_path)
File "/app/process.py", line 90, in parse_pd…
-
Hi @chrismattmann ,
Fantastic library! I was wondering if you have near plans/roadmap to make it compatible with Apache Tika version 2.1.0
I used the `tika-server-standard-2.1.0.jar` file from …
-
Sentry Issue: [OPEN-8NA](https://mit-office-of-digital-learning.sentry.io/issues/3984637609/?referrer=github_integration)
```
TransportError: TransportError(413, '{"Message":"Request size exceeded 10…
-
The startServer function attempts to concatenate the given classpath to the tika jar path with a colon. This is appropriate for Linux, but not for Windows where the correct character is the semicolon.…
-
The parse option uses /rmeta and then conflates all the keys together so that users can't tell which metadata goes with the primary file and which metadata goes with which embedded file.
Would it b…
-
Hi,
I am posting a file into my db using BE framework of Django. I would like to read the data from the file whilst parsing.
However, I am getting the error:
`AttributeError: 'InMemoryUploade…
-
I am facing an error while trying to convert a .docx file to xhtml output.
Similar issue was faced in several other (doc/docx/pdf files)
UnicodeEncodeError Description:
![image](https://user-imag…
-
### Description
Processing files are failing with:
```
2022-12-14.eml: Error while consuming document 2022-12-14.eml: Error while converting document to PDF: 503 Server Error: Service Unavailab…