chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 235 forks source link

Exception when trying to parse on Windows 10: AttributeError: module 'os' has no attribute 'setsid' #278

Closed stephen-farris-jhuapl-edu closed 4 years ago

stephen-farris-jhuapl-edu commented 4 years ago

I'm using tika 1.23 successfully on Python 3.7.4 on one Windows 10 machine. However I installed tika 1.23.1 (the latest version) on another Windows 10 machine running Python 3.8.1, and I get an exception when I try to parse files. For example tika.parser.from_file("PATH_TO_MY_PDF_FILE.pdf") results in this exception: AttributeError: module 'os' has no attribute 'setsid'. (NOTE: I am initializing the VM before making this call).

I dug into the tika source code, and found the offending line of code in tika.py:

666: TikaServerProcess = Popen(cmd_string, stdout=logFile, stderr=STDOUT, shell=True, preexec_fn=os.setsid)

The offending line references os.setsid, but setsid does not exist in the os module on Windows per the docs (quoted below):

https://docs.python.org/3.8/library/os.html

os.setsid()

Call the system call setsid(). See the Unix manual for the semantics.

Availability: Unix.

I searched through the tika commit history on GitHub and found that this issue was introduced in this commit: https://github.com/chrismattmann/tika-python/blob/431f024d9f0862599421c27afec9076ecf29c2c3/tika/tika.py.

Prior to the aforementioned commit, the line of code in question looked like this, with no reference to os.setsid:

665: cmd = Popen(cmd_string, stdout=logFile, stderr=STDOUT, shell=True)

Here's the diff that shows where the issue was introduced: https://github.com/chrismattmann/tika-python/commit/431f024d9f0862599421c27afec9076ecf29c2c3#diff-79bb8c4ed90a3c7e927d1091e49a6680

This issue is preventing me from using the current version of tika on Windows. I'm going to have to downgrade to version 1.23 until this is fixed.

sunandabansal commented 4 years ago

I ran into the same issue. Thanks to @stephen-farris-jhuapl-edu's coment downgrading to 1.23 worked for me.

chrismattmann commented 4 years ago

we have a fix for this in #280 I'll be applying it shortly. Thank you. I can push a 1.23.2 this week to release it.

Rik-de-Kort commented 4 years ago

Same issue here. Looking forward to a fix!

chrismattmann commented 4 years ago

fixed in #280

lgtateos commented 4 years ago

I installed tika through Anaconda today and I am getting the AttributeError: module 'os' has no attribute 'setsid'. exception. I'm on Python 3.6 on Windows 10.

garyng commented 4 years ago

Seems like 1.23.2 is not released yet. May I know when will it be released?

chrismattmann commented 4 years ago

hi @garyng I'll try and release it this week.

jai890216 commented 4 years ago

Hi, Although I downgraded tika to 1.23, still the same issue occured(setsid). Any suggestion until the issue fixed?

chrismattmann commented 4 years ago

you need to downgrade to 1.23.0 until I made the updated release.

rafaleo commented 4 years ago

Same here; can't upgrade nor downgrade in Anaconda Nav.

chrismattmann commented 4 years ago

@rafaleo this has been pushed in 1.24 should be good, upgrade now