chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

permission denied for log_file /tmp/tika in web server with virtualenv #150

Closed pgrandinetti closed 7 years ago

pgrandinetti commented 7 years ago

Installing python-tika in a EC2 server (with django app and virtualenv) leads a Permission Denied error on executing the following line https://github.com/chrismattmann/tika-python/blob/master/tika/tika.py#L142

looks like the temp file is created within the virtualenv folder and so it cannot be written by the web server?

Cheers

pgrandinetti commented 7 years ago

of coursechmod 777 /tmp/tika* files work, but some out-of-the-box solution would be nicer

chrismattmann commented 7 years ago

do you have a suggested patch @pgrandinetti we are just calling the python temp file creation facility - perhaps we can pass a path to use? But that would make it more brittle, no?

pgrandinetti commented 7 years ago

As a colleague suggested, setting TIKA_LOG_PATH to a folder writable by the webserver is already a better solution than my chmod. Nevertheless, it seems to me the downloaded .jar file goes to the /tmp anyway. Maybe your code could create a dir from within your code and put there all needed files?

Another little issue (not strictly related): if I am not wrong, when I call parse_file twice in a row the jar server is downloaded in both case, shouldn't be better to reuse it after the first download?

Cheers

chrismattmann commented 7 years ago

hi @pgrandinetti so, TIKA_LOG_PATH seems to be a valid work-around. The intent is for the downloaded jar in fact to go to /tmp (so that it disappears after the computer is restarted and downloaded again). Also I do not see the behavior you cite from calling parse_file twice the 2nd time is much faster after download since the server is running. Please produce a unit test that exposes this behavior if it exists.

Otherwise I think I will close this as won't fix, since there is an easy workaround.