deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.86k stars 592 forks source link

textract.exceptions.ShellError: The command antiword is not installed on your system. Please make sure the appropriate dependencies are installed before using textract #444

Open faridelya opened 1 year ago

faridelya commented 1 year ago

Can not execute antword In production by Gunicorn while in Development on same computer it work

i have install all dependences on Ubuntu before installing textract here is the link here Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'python-dev-is-python2' instead of 'python-dev' libjpeg-dev is already the newest version (8c-2ubuntu8). antiword is already the newest version (0.37-16). flac is already the newest version (1.3.3-1build1). lame is already the newest version (3.100-3). libmad0 is already the newest version (0.15.1b-10ubuntu1). libsox-fmt-mp3 is already the newest version (14.4.2+git20190427-2). pstotext is already the newest version (1.9-6build1). python-dev-is-python2 is already the newest version (2.7.17-4). sox is already the newest version (14.4.2+git20190427-2). swig is already the newest version (4.0.1-5build1). tesseract-ocr is already the newest version (4.1.1-2build2). unrtf is already the newest version (0.21.10-clean-1). libxml2-dev is already the newest version (2.9.10+dfsg-5ubuntu0.20.04.4). libxslt1-dev is already the newest version (1.1.34-4ubuntu0.20.04.1). poppler-utils is already the newest version (0.86.1-0ubuntu1.1). ffmpeg is already the newest version (7:4.2.7-0ubuntu0.1). 0 upgraded, 0 newly installed, 0 to remove and 44 not upgraded.

The following work is done on same server

problem:

When i updated changes by sudo systemctl restart gunicorn.service and sudo systemctl restart nginx

Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: byte_string = self.extract(filename, **kwargs) Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: File "/home/ubuntu/web-server/env/lib/python3.7/site-packages/textract/parsers/doc_parser.py", line 9, in extract Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: stdout, stderr = self.run(['antiword', filename]) Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: File "/home/ubuntu/web-server/env/lib/python3.7/site-packages/textract/parsers/utils.py", line 96, in run Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: ' '.join(args), 127, '', '', Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: textract.exceptions.ShellError: The command antiword /home/ubuntu/web-server/data/test_cvs/Yassin.docx failed because the executable Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: antiword is not installed on your system. Please make Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: sure the appropriate dependencies are installed before using Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: textract: Oct 27 06:16:23 ip-172-31-77-202 gunicorn[389992]: http://textract.readthedocs.org/en/latest/installation.html`

ubuntu@:~/web-server/data/test_cvs$ which antiword /usr/bin/antiword` i also uninstalled and and reinstalled antiword but still the problem exist. i am stuck but it doesnt work in production but on port 8000 it work and i get output. why gunicorn cannot execute antiword? any help would be appreciated Thanks.

python version = 3.7 OS = Ubuntu 20.04