PyYoshi / cChardet

universal character encoding detector
Other
384 stars 50 forks source link

Support Python 3.10 #77

Open decaz opened 2 years ago

decaz commented 2 years ago

... and prepare for Python 3.11 (dev).

spenpal commented 2 years ago

Any news on making this library Python 3.10 supported?

oleksandr-kuzmenko commented 2 years ago

JFYI https://github.com/PyYoshi/cChardet/pull/78

ooliver1 commented 2 years ago

After 6 months, will that PR get merged or even looked at

mikBighne98 commented 2 years ago

Future updates like python 3.10 support coming or the project is dropped?

ooliver1 commented 2 years ago

it seems as though cchardet has been abandoned, yet it is depended on by many large projects

NebularNerd commented 2 years ago

Are there any good alternatives to cchardet? If the repo is not getting any love then quite a few projects will need a replacement.

ooliver1 commented 2 years ago

^, it is possible to install gcc but this wont be the biggest issue forever, 3.11/12 may somehow break this

NebularNerd commented 2 years ago

^, it is possible to install gcc but this wont be the biggest issue forever, 3.11/12 may somehow break this

I assume this is for *nix users, I'm on Windows and it keeps throwing up the 'C++ 14 Required' error when I try to install. I assume because for Windows it's trying to compile using C++ instead of gcc

Does anyone know if I can manually compile this on Windows using my MinGW gcc install? I'd rather not download multi GB's of Visual Studio just for one python package.

ooliver1 commented 2 years ago

^, it is possible to install gcc but this wont be the biggest issue forever, 3.11/12 may somehow break this

I assume this is for *nix users, I'm on Windows and it keeps throwing up the 'C++ 14 Required' error when I try to install. I assume because for Windows it's trying to compile using C++ instead of gcc

Does anyone know if I can manually compile this on Windows using my MinGW gcc install? I'd rather not download multi GB's of Visual Studio just for one python package.

@NebularNerd yes this was *nix, here are steps i found for windows using mingw

banagale commented 2 years ago

I ended up here when I was upgrading the project's python version and started hitting up against errors involving this package in pip.

Are there any good alternatives to cchardet? If the repo is not getting any love then quite a few projects will need a replacement.

It depends on what you're trying to do. There's an MIT licensed package called charset_normalizer many seem to have switched to.

charset_normalizer focuses on providing you the actual text content in usable, unicode form.

Whereas, it seems like cchardet focuses on trying to tell you what a text file is encoded in. In a project I'm working on, this detected encoding is attempted to be used with an open().

charset_normalizer is like, "why bother with determining the exact encoding scheme?"

Instead it figures out the most likely original encoding scheme to result in successful decoding and encoding to useable text content.

If you look, it is specifically compared with this package and calls out this package, cChardet's apparent use of a cpp binding. It also claims it has higher accuracy but possibly less speed.

NebularNerd commented 2 years ago

Thanks @ooliver1 and @banagale for your replies. I'm going to take a good look at charset_normalizer as anyone having to install gcc just to compile cChardet for my small Subtotxt script seems a trifle excessive.

In the meantime I'll compile it with gcc as an interim bodge.

mohd-akram commented 1 year ago

It was working for me on Python 3.10, but now fails to install on Python 3.11:

/usr/bin/clang -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -pipe -Os -isysroot/Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -Isrc/ext/uchardet/src -I/opt/local/Library/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c src/cchardet/_cchardet.cpp -o build/temp.macosx-12.0-x86_64-cpython-311/src/cchardet/_cchardet.o
src/cchardet/_cchardet.cpp:196:12: fatal error: 'longintrepr.h' file not found
  #include "longintrepr.h"
           ^~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1

EDIT: Manually installing cython beforehand seems to fix the issue (possibly related to cython/cython#4461).

SimplicityGuy commented 1 year ago

There are 2 PRs #78 and #80 that will address this. @PyYoshi, can you merge and release a new build, please?

ooliver1 commented 1 year ago

There are 2 PRs #78 and #80 that will address this. @PyYoshi, can you merge and release a new build, please?

It is pretty established they have abandoned cchardet, see the PRs you referenced, #78 is nearly 1 year old.

SimplicityGuy commented 1 year ago

It is pretty established they have abandoned cchardet, see the PRs you referenced, #78 is nearly 1 year old.

Indeed. It's unfortunate since right now many downstream dependencies can't be completely installed with Python 3.11 due to build issues.

NebularNerd commented 1 year ago

At this stage it's come down to either moving to charset_normalizer or if someone is willing to, fork this and make cchardet-ng or similar.

wbarnha commented 1 year ago

Might want to take a look at this: https://github.com/faust-streaming/cChardet

pip install faust-cchardet

I support Python 3.10+3.11 now, so we're good. I'll open a PR so that some day if @PyYoshi comes back to this project, he can update this.