codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.84k stars 470 forks source link

Better control over decoding problems reporting #3427

Closed yarikoptic closed 3 months ago

yarikoptic commented 3 months ago

Encountered in the scope of

ATM codespell would consistently and unavoidably (could be avoided via -q,--quiet-level and CLI and quiet-level = 3 config option) warn each time being unable to decode a file in utf-8 , even if file is not intended to be in utf-8, e.g. there

❯ codespell
WARNING: Cannot decode file using encoding "utf-8": ./lib/spack/spack/test/data/filter_file/x86_cpuid_info.c
WARNING: Trying next encoding "iso-8859-1"
WARNING: Cannot decode file using encoding "utf-8": ./var/spack/repos/builtin/packages/sz/test/testfloat_8_8_128.dat
WARNING: Trying next encoding "iso-8859-1"

even though possibly finding/fixing some typos then using iso-8859-1 . I think it would be valuable to tame down codespell to avoid complaining or use alternative encoding for some files. E.g. it could potentially react to pragma codespell-encoding: in the header of the file (e.g. first 10 lines or 1000 bytes whatever comes smaller) so authors could provide instructions for custom encoding similarly to how modes for emacs / vim are typically specified in the header.

WDYT?

peternewman commented 3 months ago

Please see this option, where you can turn off the decoding alerts if desired: https://github.com/codespell-project/codespell/blob/5c0343bda132afe7a3721419ac2a863a533c612d/codespell_lib/_codespell.py#L518-L536

yarikoptic commented 3 months ago

CLI option is a good to know, thank you, I have managed to miss it. I adjusted original description to be more specific -- as to some files might use/need different encoding so overall disabling might be undesired.

But since I think original use-case would be satisfied with overall suppressing (done now), I would just consider this to be largely not needed, so will close.

Thank you @peternewman !