chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

`module 'tika' has no attribute 'initVM'` and `ImportError: cannot import name 'parser'` #277

Closed agiokas-dbg closed 4 years ago

agiokas-dbg commented 4 years ago

I've followed the installation instructions, using pip in a virtual environment:

And then setup the environment variables:

TIKA_VERSION=1.23
TIKA_CLIENT_ONLY=True
TIKA_PATH=/tmp/tika-server.jar
TIKA_JAVA=java
PYTHONIOENCODING=utf8

I've also tried to add the jar in the current working directory and use this instead:

TIKA_VERSION=1.23
TIKA_JAVA=java
TIKA_SERVER_JAR="file:////tika-server.jar"
PYTHONIOENCODING=utf8

When using the example script:

import argparse
import tika
tika.initVM()
from tika import parser

ap = argparse.ArgumentParser(prog='extractor',
                        description='Extract Text from PDF programmatically')
ap.add_argument("--input", required = True, help = "Path to the PDF file.")
args = vars(ap.parse_args())
print("WARNING: export PYTHONIOENCODING=utf8")

parsed = parser.from_file(args['input'])
print(parsed["content"])

I get module 'tika' has no attribute 'initVM' and when I remove the tika.initVM() I get ImportError: cannot import name 'parser'.

mathiashoeld commented 4 years ago

are you getting the same issue with older versions, e.g. TIKA_VERSION=1.22?

agiokas-dbg commented 4 years ago

Hi @mathiashoeld I just tried it with version 1.22 and I am getting the same issue. Should I try with an even older version? What annoys me is that my colleague is running it in docker with 1.23 using the same code, and it works fine.

chrismattmann commented 4 years ago

If you are in a local checkout of tika-python when running it, you will get module import errors. Try cd'ing out of the Git checkout (if you are running this from that dir) and see if that fixes it @agiokas-dbg

aabid0193 commented 4 years ago

Was this issue ever resolved? I am still having this issue if installing with pip

agiokas-dbg commented 4 years ago

I never managed to solve this.

On Wed, 6 May 2020 at 17:18, aabid notifications@github.com wrote:

Was this issue ever resolved? I am still having this issue if installing with pip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chrismattmann/tika-python/issues/277#issuecomment-624746705, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANBMA5LU4RSWMDUR42Q6ARTRQGEWHANCNFSM4KHGB2TQ .

chrismattmann commented 4 years ago

@aabid0193 @agiokas-dbg can you find your $PYTHON site-packages directory and then rm -rf tika* out of there? Try it on a fully fresh Python install. the .initVM() is likely related to the parser module not found. For whatever reason your tika library isn't being loaded or installed correctly.

FWIW, .initVM is no longer needed and only left in for back compat.