deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.92k stars 609 forks source link

textract3-1.6.4.post1 and textract-1.6.5 compilation error: error in beautifulsoup4 setup command: use_2to3 is invalid. #464

Open ashish-2022 opened 1 year ago

ashish-2022 commented 1 year ago

Describe the bug When installing textract3-1.6.4.post1 or textract-1.6.5 from source we get following error:

bash-4.2$ pip3 install --no-binary :all: textract3
Collecting textract3
  Downloading textract3-1.6.4.post1.tar.gz (16 kB)
  Preparing metadata (setup.py) ... done
Collecting argcomplete~=1.10.0 (from textract3)
  Using cached argcomplete-1.10.3.tar.gz (50 kB)
  Preparing metadata (setup.py) ... done
Collecting beautifulsoup4~=4.8.0 (from textract3)
  Using cached beautifulsoup4-4.8.2.tar.gz (298 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      error in beautifulsoup4 setup command: use_2to3 is invalid.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
bash-4.2$

To Reproduce Steps to reproduce the behavior:

  1. Install Python 3.10.8
  2. run command pip3 install --no-binary :all: textract3
  3. This is also occurring if we download and extract textract3-1.6.4.post1.tar.gz from pypi and try to install locally using: pip3 install python_packages/textract3-1.6.4.post1/
  4. As per my initial analysis we need to upgrade beautifulsoup4 version in textract3 and textract

Expected behavior The package should compile and install without any error from source tar.gz file

Desktop (please complete the following information):

ashish-2022 commented 1 year ago

I'm not an expert on this package but from my little knowledge I know that we have to upgrade version of beautifulsoup4 in following files: bash-4.2$ vim python_packages/textract3-1.6.4.post1/requirements/python bash-4.2$ vim python_packages/textract3-1.6.4.post1/textract3.egg-info/requires.txt

See here beautifulsoup4 has fixed this in latest versions: https://bugs.funtoo.org/browse/FL-8788?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel