deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.91k stars 608 forks source link

Python2 deprecation notice #390

Open traverseda opened 3 years ago

traverseda commented 3 years ago

Some of the libraries textract uses are no longer supporting python3 with newer releases. Since python2 is EOL I don't think there's any problem with no longer supporting it.

My intent is to do one more release with various bug fixes for python2, and then do a minor version bump named 1.7.0 that incorporates python3 only libraries..

Presuming that sounds good to everyone 1.6.4 is likely to be the last release supporting python2.

deanmalmgren commented 3 years ago

That sounds reasonable to me. Thanks!

On Fri, Jul 30, 2021, 16:55 traverseda @.***> wrote:

Some of the libraries textract uses are no longer supporting python3 with newer releases. Since python2 is EOL I don't think there's any problem with no longer supporting it.

My intent is to do one more release with various bug fixes for python2, and then do a minor version bump named 1.7.0 that incorporates python3 only libraries..

Presuming that sounds good to everyone 1.6.4 is likely to be the last release supporting python2.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deanmalmgren/textract/issues/390, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB6NOBNWB47DQSNVGPHHCLT2MNWDANCNFSM5BJI4ZWA .

0LL13 commented 3 years ago

Oh, that's really cool. The "extract" method in pdf_parser.py says it gracefully falls back on pdfminer if pdftotext is not installed, but the requirements lock on pdfminer.six==20191110 for python2 support which mucks up other packages and doesn't seem to make sense to me if pdftotext is the package of choice anyway.

traverseda commented 3 years ago

Sorry about not being able to move as fast on this as I wanted. There was a death in my family, alongside my job rather suddenly having difficulty paying me, and on the whole I've had a hard time finding the time to work on this. Sorry about that.

0LL13 commented 3 years ago

Sorry to hear that!

tylerganter commented 1 year ago

Is there any update on this? this project seems to have endless dependency issues due to continued python 2 support

seankfh commented 1 year ago

Created "Error in textract setup command w/ extract-msg<=0.29. due to Wheel 0.40.0" https://github.com/deanmalmgren/textract/issues/461. I believe `extract-msg<=0.29.` is only necessary for Python 2. Can we get consensus on a fork like this https://github.com/deanmalmgren/textract/pull/433 @deanmalmgren?