Closed pgrandinetti closed 7 years ago
of coursechmod 777 /tmp/tika*
files work, but some out-of-the-box solution would be nicer
do you have a suggested patch @pgrandinetti we are just calling the python temp file creation facility - perhaps we can pass a path to use? But that would make it more brittle, no?
As a colleague suggested, setting TIKA_LOG_PATH to a folder writable by the webserver is already a better solution than my chmod
.
Nevertheless, it seems to me the downloaded .jar file goes to the /tmp anyway.
Maybe your code could create a dir from within your code and put there all needed files?
Another little issue (not strictly related): if I am not wrong, when I call parse_file
twice in a row the jar server is downloaded in both case, shouldn't be better to reuse it after the first download?
Cheers
hi @pgrandinetti so, TIKA_LOG_PATH seems to be a valid work-around. The intent is for the downloaded jar in fact to go to /tmp (so that it disappears after the computer is restarted and downloaded again). Also I do not see the behavior you cite from calling parse_file twice the 2nd time is much faster after download since the server is running. Please produce a unit test that exposes this behavior if it exists.
Otherwise I think I will close this as won't fix, since there is an easy workaround.
Installing python-tika in a EC2 server (with django app and virtualenv) leads a Permission Denied error on executing the following line https://github.com/chrismattmann/tika-python/blob/master/tika/tika.py#L142
looks like the temp file is created within the virtualenv folder and so it cannot be written by the web server?
Cheers