codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.9k stars 466 forks source link

feat: report file name of file that chardet fails to read #3524

Open corneliusroemer opened 2 months ago

corneliusroemer commented 2 months ago

resolves #3519

Tested and it works now, reporting the file name:

codespell --write-changes -i3 -C 5 -H -f -e --count -s --builtin clear,rare,names
Failed to decode file ./pep_sphinx_extensions/tests/pep_lint/test_pep_number.py using detected encoding Windows-1254.
Traceback (most recent call last):
  File "/Users/corneliusromer/micromamba/envs/codespell/bin/codespell", line 8, in <module>
    sys.exit(_script_main())
             ^^^^^^^^^^^^^^
  File "/Users/corneliusromer/code/codespell/codespell_lib/_codespell.py", line 1103, in _script_main
    return main(*sys.argv[1:])
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/code/codespell/codespell_lib/_codespell.py", line 1300, in main
    bad_count += parse_file(
                 ^^^^^^^^^^^
  File "/Users/corneliusromer/code/codespell/codespell_lib/_codespell.py", line 945, in parse_file
    lines, encoding = file_opener.open(filename)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/code/codespell/codespell_lib/_codespell.py", line 232, in open
    return self.open_with_chardet(filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/code/codespell/codespell_lib/_codespell.py", line 246, in open_with_chardet
    lines = self.get_lines(f)
            ^^^^^^^^^^^^^^^^^
  File "/Users/corneliusromer/code/codespell/codespell_lib/_codespell.py", line 303, in get_lines
    lines = f.readlines()
            ^^^^^^^^^^^^^
  File "/Users/corneliusromer/micromamba/envs/codespell/lib/python3.12/encodings/cp1254.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 1349: character maps to <undefined>
corneliusroemer commented 2 months ago

I've added tests because codecov failed otherwise. Not sure these are super important but what's done is done! Learned something about testing on the way - and mocking!

DimitriPapadopoulos commented 2 months ago

I am happy with codecov failing on exceptions, but I don't have rights to merge when CI tests fail. With that said, adding tests looks like the best option.

corneliusroemer commented 2 months ago

Indeed, I will make an issue there if I remember

On Tue, Aug 20, 2024, 09:50 Dimitri Papadopoulos Orfanos < @.***> wrote:

@.**** commented on this pull request.

In codespell_lib/_codespell.py https://github.com/codespell-project/codespell/pull/3524#discussion_r1722852956 :

  • return lines, f.encoding
  • return lines, encoding

Indeed. The chardet documentation should describe the return values in more detail.

— Reply to this email directly, view it on GitHub https://github.com/codespell-project/codespell/pull/3524#discussion_r1722852956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF77AQNAITVX26F5UWGKDP3ZSLYM5AVCNFSM6AAAAABMWRLYDWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDENBXGIZDSNBYG4 . You are receiving this because you authored the thread.Message ID: @.***>