Latest pip prevents installation with Python > 3.9 due to cchardet

mementum commented 1 year ago

Issue https://github.com/Neoteroi/BlackSheep/issues/276 was opened for Python 3.11 but this also affects, for example installing under Windows with Python 3.10

The current determination of whether to use cchardet or cchardet happens here

https://github.com/Neoteroi/BlackSheep/blob/34ae49496fcb6e5809fd132fa63ba4776f900d54/blacksheep/messages.pyx#L7-L10

And it is then used here

https://github.com/Neoteroi/BlackSheep/blob/34ae49496fcb6e5809fd132fa63ba4776f900d54/blacksheep/messages.pyx#L157

The current requirements.txt is somehow conflicting, because it allows skipping installation of cchardet for Python 3.11 and the same time says that chardet (the non C version, i.e.: pure python) is also requirement for all versions.

https://github.com/Neoteroi/BlackSheep/blob/34ae49496fcb6e5809fd132fa63ba4776f900d54/requirements.txt#L3-L8

And charset-normalizer, as suggested in issue #276 should be used as a replacement for cchardet. Surprisingly it is a requirement but it is not used.

Suggested actions

Drop cchardet as a hard requirement for anything above 3.9. Because there are wheels up to that version. i.e. Change cchardet==2.1.7; python_version < '3.11' to cchardet==2.1.7; python_version < '3.10'
Change the imports to also import charset-normalizer and use it. The detect function is chardet compatible.
```
     try:
         import cchardet as chardet
     except ImportError:
         try:
             import charset_normalizer as chardet
         except ImportError
             import chardet
```
This is actually redundant given that charset-normalizer has been added a a hard requirement to requirements.txt. A better approach would be
```
     try:
         import cchardet as chardet
     except ImportError:
         import charset_normalizer as chardet
```
And drop the pure python chardet altogether from requirements.txt. Both blacksheep and charset-normalizer are good for python 3.8 => 3.11 (which is the minimum common denominator, taking blacksheep as the restriction)

The detect portion of the code needs no change.

Should you need a pull-request, don't hesitate to say: yes and I will submit one

RobertoPrevato commented 1 year ago

Hi @mementum Thank You for the analysis ~~- I must have done a mistake with the requirements file. And Yes, I'll be grateful if you can submit a PR for the cchardet issue.~~

Please note anyway that the requirements.txt file describes the development dependencies. Those include libraries used in integration tests like Flask, requests, etc. The setup.py file describes the actual dependencies of blacksheep. Side note: already have in mind to migrate to pyproject.toml, and I started doing it for other projects, but I didn´t take the time to look into packing Cython extensions that way, yet.

RobertoPrevato commented 1 year ago

@mementum It seems the problem is not only about cchardet. It's the new version of pip that doesn't accept packages without wheels. Also, the CI build in Windows doesn't work anymore because the --install-options has been deprecated.

I'll need to look into this anyway - I was planning to move to pyproject.toml as I did for other libraries, but I didn't follow the (build) news so much to know all details. ~~I'll be grateful if you submit a PR for the cchardet issue, I'll look into the other packages and into migrating to pyproject.toml.~~

RobertoPrevato commented 1 year ago

@mementum I hope you didn´t start preparing a PR in the last couple of hours, because I ended up fixing the build issue and upgrading several dependencies. After reading a bit about charset-normalizer, I decided to replace altogether both chardet and cchardet with it. It looks ideal!

Sorry for the three mentions in my comments. Thank You for the heads up about charset-normalizer!

RobertoPrevato commented 1 year ago

Fixed on main - still need to fix on v1 branch and publish to PyPi.

RobertoPrevato commented 1 year ago

Hi @mementum This has been fixed and released with

https://github.com/Neoteroi/BlackSheep/releases/tag/v1.2.14

It wasn´t affecting only Windows, all platforms with Python 3.10 and 3.11, because of a breaking change in the latest pip.

Thank You for reporting the issue and recommending charset-normalizer!

q0w commented 1 year ago

You could use https://github.com/faust-streaming/cChardet as active fork

techntools commented 1 year ago

After changing to charset-normalizer, what does the benchmarks say ?

Its a pure Python library compared to cchardet

RobertoPrevato commented 1 year ago

@techntools in this case performance is not an issue. The charset detection is only used when:

the client does not declare the content type encoding in the "content-type" request header and the incoming data is not encoded using utf-8 (in practice, most content is encoded that way)
or, the client declares a wrong content type (sends content data encoded with one encoding, declares a different one in the content-type request header)
the incoming data cannot be decoded using utf-8 or ISO-8859-1
only then, automatic detection is used (arguably I could remove the depedency altogether and make it configurable, and if nothing is configured the server might return BadRequest)

    async def text(self):
        body = await self.read()

        if body is None:
            return ""
        try:
            return body.decode(self.charset)
        except UnicodeDecodeError:
            # this can happen when the server returned a declared charset,
            # but its content is not actually using the declared encoding
            # a common encoding is 'ISO-8859-1', so before using chardet, we try with this
            if self.charset != 'ISO-8859-1':
                try:
                    return body.decode('ISO-8859-1')
                except UnicodeDecodeError:
                    # fallback to trying to detect the encoding;
                    return body.decode(charset_normalizer.detect(body)['encoding'])

Neoteroi / BlackSheep

Latest pip prevents installation with Python > 3.9 due to cchardet #331