codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.85k stars 470 forks source link

Rouge ==> Rogue and -L error. #1390

Open kierun opened 4 years ago

kierun commented 4 years ago

My code is using a quote form Alexandre Dumas' Les Trois Mousquetaires as a string to test some code with. The quote contains "à Amsterdam, chez Pierre Rouge." Running codespell, I get: Rouge ==> Rogue. This is not a misspelling of rogue, not even close but since this is a quote in an English file, that's fair, I guess. I tried to exclude it using:

; echo "Rouge" > ook
; codespell ook
ook:1: Rouge  ==> Rogue
; codespell -L Rouge ook
ook:1: Rouge  ==> Rogue
; codespell -L "Rouge" ook
ook:1: Rouge  ==> Rogue
; cp ook ignore
; codespell -I ignore ook
ook:1: Rouge  ==> Rogue
; codespell --version
1.16.0

From the help:

Words are case sensitive based on how they are written in the dictionary file

So I tried, for completeness sake:

; codespell -L rouge ook
; echo $?
0

That seems like broken behaviour to me… Or am I missing something?

peternewman commented 4 years ago

It's also not true, currently it lowercases the typo from the dictionary, and then compares it to the ignore word/line from the ignore words file (without changing the case of the ignore thing), and if it's a match excludes it from the list of typos. https://github.com/codespell-project/codespell/blob/233d76c21bfd4eff12c0c07d9a163010e85a0bdb/codespell_lib/_codespell.py#L319-L344

But yes, I'd probably agree they should probably all be lowercased for now first before checking. Although that would mean more work for users in future when the dictionary can handle mixed case.