-
Hi, I've been using a lot python tika to exctract text from some pdfs. Suddenly Tika doesn't work any more with the following code and similar:
```
from tika import parser
document = parser.from_fi…
-
Here i have tried to read the pdf contents which were scanned.
import tika
tika.initVM()
from tika import parser
PDF_file = "Sample.pdf"
with open(PDF_file, 'rb') as file_obj:
parsed…
-
## Current Behavior
conda is not working normally on my Mac. regardless of I tried to uninstall, upgrade, or check info, it returns the same traceback as follows
Commands I tried:
`conda upgr…
-
Is there any API to shutdown the tika server after it is no longer needed? It would be nice to be able to do this programmatically from within a python script (after the tika server is no longer neede…
-
an RTF message similar to the below is wrongly detected as `message/rfc822` and handed off to the email parser even though it starts with an RTF tag. This is likely due to the presence of "Sent: " an…
-
Dear Members,
I am currently using Tika-Python.
def tika_parser(file_path):
# Extract text from document
content = parser.from_file(file_path)
if 'content' in content:
te…
-
While this is possible, we have to think of a way the user can pass this in.
Right now it just checks that PIL can load it, however those using Caffe to generate descriptors might want to ensure th…
-
Hi I tried to use tika-python in **aws lambda**, using docker container image, but when I tried it is throwing error. I tried installing java. It is not working. Can anyone help me with that
-
Is it possible to pull bookmarks(table of contents) out of PDF using tika-python? Thank you in advance for the info!
```
[ ] Preface
[ ] Contents
[ ] Contributors
...
```
-
Can someone assist? I am trying to get tika-python to return json with metadata and text when using the docker image of tika. I can get the results I want using the curl command, but not with python…