deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.83k stars 585 forks source link

Cannot Install with other packages due to `~=` #515

Open Eboubaker opened 3 weeks ago

Eboubaker commented 3 weeks ago

If possible can the ~= be replaced with >= I cannot install this library in a big project with many other depenencies https://github.com/deanmalmgren/textract/blob/ec3c0c3c982078d22e51cc2753baeaf48cdf2e19/requirements/python#L11C1-L12C1

Issue1

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
textract 1.6.5 requires beautifulsoup4~=4.8.0, but you have beautifulsoup4 4.9.0 which is incompatible.

Issue2

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pyarabic 0.6.15 requires six>=1.14.0, but you have six 1.12.0 which is incompatible.
simonschmidt commented 2 weeks ago

For me this library causes issues with boto3 in lambda, which in turn imports dateutil which imports six like from six.moves import ... which doesn't work in python3.12 with old versions of six


  File "/usr/lib/python3.12/site-packages/dateutil/tz/tz.py", line 21, in <module>
    from six.moves import _thread
ModuleNotFoundError: No module named 'six.moves'

Trying to explicitly set versions fails during install (which is good, in its own way)

venv $ python --version
Python 3.12.3

venv $ pip install 'textract>=1.6.5' 'six>=1.16.0'
....
│ INFO: pip is looking at multiple versions of textract to determine which version is compatible with other requirements. This could take a while.
│ ERROR: Cannot install -r requirements.txt (line 1) and six>=1.16.0 because these package versions have conflicting dependencies.
│ 
│ The conflict is caused by:
│     The user requested six>=1.16.0
│     textract 1.6.5 depends on six~=1.12.0
...

But the old six version works fine with python3.11 so for now I work around this by using older python version.

It's also possible to manually update six after normal package install to bypass the compatability checks - but that would add a bit too much complexity to my existing lambda building flow.