Closed amonaldo closed 5 years ago
hi @amonaldo what happens when you print(tika.TikaJarPath)
?
@chrismattmann it prints the path of the module containing the jar file
@amonaldo, You need to specify the absolute path to the parameter of dirname which would become like this:
os.path.join(os.getcwd(), __file__)
Moreover, you need to override three variables of tika module i.e., log_path, TikaJarPath,TikaFilesPath in order to make your modified script work.
Modify your pdf.py (updating the filename):
import os
from tika import tika, parser
abs_path = os.path.dirname(os.path.join(os.getcwd(), __file__)) # Store the absolute path of your file (containing .jar)
# Update the required variables
tika.log_path = os.getenv('TIKA_LOG_PATH', abs_path)
tika.TikaJarPath = os.getenv('TIKA_PATH', abs_path)
tika.TikaFilesPath = os.path.dirname(os.path.join(os.getcwd(), __file__))
def get_pdf_text(path):
parsed = parser.from_file(path)
return parsed['content']
if __name__ == "__main__":
pdf_name = "TEST_FILE_NAME" # filename to test
print(get_pdf_text(pdf_name))
@RafayGhafoor I tried but still the same error.
@amonaldo, have you tried restarting your computer or killing the Tika-server since the instance of the server keeps running in the background?
@RafayGhafoor I'm using Flask and I always restart the server, which causes the Java instance to be destroyed
@amonaldo, Can you try it on a separate test module in which flask is not required, then perhaps, we can debug?
@RafayGhafoor it works outside Flask. I don't know why it fails when Flask is running
@amonaldo, Perhaps, you have module using flask in separate directory. Can you try moving the Tika related files in the same directory and see if the same error occurs?
The same thing happens. I have a file called run.py
that runs Flask and even when I moved the jar file to the same directory it just doesn't work
@amonaldo, Can you show me your run.py code to see how it's using Tika?
@RafayGhafoor this is the code
from smartcv.web import app
from waitress import serve
if __name__ == "__main__":
try:
serve(app, port=8080, host='0.0.0.0')
except Exception as e:
print(str(e))
I'm using waitress to serve the Flask app, which is defined in another module
@amonaldo, How/Where it's using tika?
p.s. drop me an email, since this issue doesn't seem like a bug related to tika.
@RafayGhafoor Thanks for your time, but I have found a solution although it's not perfect.
I realized that I can get the user home directory using the os
module
tika.TikaJarPath = os.path.expanduser("~")
This way Tika works fine and without any problem.
I'm working on a Python module that uses Tika, and I'm trying to use a custom jar file so that it does not get downloaded each time
I have already placed the jar file and the md5 file inside the module
Tika does not work and this is the output :
The problem happens when the jar file is inside the module. It works if I specify another location, but that's not an option because when I deploy the Python module, I need the jar file to contain it.