deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

Textract.process not working...MacOS #252

Closed animishgautam closed 5 years ago

animishgautam commented 6 years ago

I am having the following error:

Traceback (most recent call last):

  File "<ipython-input-25-c7a4aaeff23c>", line 104, in <module>
    org_text.append(textract.process(filepath[i], encoding='utf-8'))

  File "/anaconda3/lib/python3.6/site-packages/textract/parsers/__init__.py", line 77, in process
    return parser.process(filename, encoding, **kwargs)

  File "/anaconda3/lib/python3.6/site-packages/textract/parsers/utils.py", line 47, in process
    unicode_string = self.decode(byte_string)

  File "/anaconda3/lib/python3.6/site-packages/textract/parsers/utils.py", line 64, in decode
    result = chardet.detect(text)

  File "/anaconda3/lib/python3.6/site-packages/chardet/__init__.py", line 30, in detect
    u.feed(aBuf)

  File "/anaconda3/lib/python3.6/site-packages/chardet/universaldetector.py", line 125, in feed
    self._mCharSetProbers = [MBCSGroupProber(), SBCSGroupProber(),

  File "/anaconda3/lib/python3.6/site-packages/chardet/mbcsgroupprober.py", line 43, in __init__
    CharSetGroupProber.__init__(self)

TypeError: __init__() takes 1 positional argument but 2 were given

and sometimes the following: super(type, obj): obj must be an instance or subtype of type

The package was working fine before but for some reason I got updated to python 3.7 after which I got back to python 3.6 but now the package is showing the above errors.

Sorry, I am new to python coding. So let me know if I havent given you the full info.

Thanks.

jpweytjens commented 5 years ago

Could you provide a test file and the code you used to process this file with textract? If possible, try again using the latest version of textract.

jpweytjens commented 5 years ago

I'm closing this issue due to inactivity. If you still encounter the issue with the latest version of textract, feel free to leave a comment with additional information and I'll reopen the issue.