chardet / chardet

Python character encoding detector
GNU Lesser General Public License v2.1
2.16k stars 257 forks source link

test_detect_all_and_detect_one_should_agree fails on Python 3.11b3 #256

Open musicinmybrain opened 2 years ago

musicinmybrain commented 2 years ago
$ python3.11 --version
Python 3.11.0b3
$ python3.11 -m venv _e
$ . _e/bin/activate
(_e) $ pip install -e .
(_e) $ pip install -e pytest hypothesis
(_e) $ pytest

results in:

====================================================== FAILURES ======================================================
____________________________________ test_detect_all_and_detect_one_should_agree _____________________________________

txt = 'Ā𐀀', enc = 'utf-8', _ = HypothesisRandom(generated data)

    @given(
        st.text(min_size=1),
        st.sampled_from(
            [
                "ascii",
                "utf-8",
                "utf-16",
                "utf-32",
                "iso-8859-7",
                "iso-8859-8",
                "windows-1255",
            ]
        ),
        st.randoms(),
    )
    @settings(max_examples=200)
    def test_detect_all_and_detect_one_should_agree(txt, enc, _):
        try:
            data = txt.encode(enc)
        except UnicodeEncodeError:
            assume(False)
        try:
            result = chardet.detect(data)
            results = chardet.detect_all(data)
>           assert result["encoding"] == results[0]["encoding"]
E           AssertionError: assert None == 'utf-8'

test.py:183: AssertionError

The above exception was the direct cause of the following exception:

    @given(
>       st.text(min_size=1),
        st.sampled_from(
            [
                "ascii",
                "utf-8",
                "utf-16",
                "utf-32",
                "iso-8859-7",
                "iso-8859-8",
                "windows-1255",
            ]
        ),
        st.randoms(),
    )

test.py:160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

txt = 'Ā𐀀', enc = 'utf-8', _ = HypothesisRandom(generated data)

    @given(
        st.text(min_size=1),
        st.sampled_from(
            [
                "ascii",
                "utf-8",
                "utf-16",
                "utf-32",
                "iso-8859-7",
                "iso-8859-8",
                "windows-1255",
            ]
        ),
        st.randoms(),
    )
    @settings(max_examples=200)
    def test_detect_all_and_detect_one_should_agree(txt, enc, _):
        try:
            data = txt.encode(enc)
        except UnicodeEncodeError:
            assume(False)
        try:
            result = chardet.detect(data)
            results = chardet.detect_all(data)
            assert result["encoding"] == results[0]["encoding"]
        except Exception as exc:
>           raise RuntimeError(f"{result} != {results}") from exc
E           RuntimeError: {'encoding': None, 'confidence': 0.0, 'language': None} != [{'encoding': 'utf-8', 'confidence': 0.505, 'language': ''}]

test.py:185: RuntimeError
----------------------------------------------------- Hypothesis -----------------------------------------------------
Falsifying example: test_detect_all_and_detect_one_should_agree(
    txt='Ā𐀀', enc='utf-8', _=HypothesisRandom(generated data),
)
============================================== short test summary info ===============================================
FAILED test.py::test_detect_all_and_detect_one_should_agree - RuntimeError: {'encoding': None, 'confidence': 0.0, '...
================================ 1 failed, 375 passed, 6 xfailed, 1 xpassed in 9.79s =================================

The same steps succeed with Python 3.10.4.

mgorny commented 2 years ago

Actually, it seems to yield unstable results with any version of Python. Sometimes it passes, sometimes it fails.

musicinmybrain commented 2 years ago

Actually, it seems to yield unstable results with any version of Python. Sometimes it passes, sometimes it fails.

Interesting! I went back and tried this repeatedly—ten or so times each—and for me, it appears to always fail on Python 3.11.0b3, and never on 3.10.4. Curious.

dan-blanchard commented 2 years ago

In my experience, this test is flakey on different machines. Figuring out why this is the case has been on my to-do list for a very long time, so any help there would be greatly appreciated.

nieder commented 6 months ago

test_detect_all_and_detect_one_should_agree also is failing on macOS with python3.7-3.10 from v5.2.0