codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.84k stars 470 forks source link

assertIn false positivies #3430

Closed nijel closed 1 month ago

nijel commented 2 months ago

Codespell 2.3.0 started to complain about assertIn method:

weblate/lang/tests.py:330: assertIn ==> asserting, assert in, assertion
weblate/lang/tests.py:331: assertIn ==> asserting, assert in, assertion

I know I can allow this particular word, but complaining on standard Python test suite API seems a bit annoying.

DimitriPapadopoulos commented 2 months ago

How about moving it to the code dictionary? Merge requests welcome.

CodyCBakerPhD commented 2 months ago

@DimitriPapadopoulos

How about moving it to the code dictionary? Merge requests welcome.

Happy to try to fix this, tracked it down to this entry in the global dict: https://github.com/codespell-project/codespell/blob/master/codespell_lib/data/dictionary.txt#L5633

What I'm not clear on is how moving this to dictionary_code would help; it seems what we would want it to do is ignore the specific capitalization of .assertIn (either pre-ambled by self or TestCase) when in .py files

Is there a separate place in codespell to specify conditional code-related ignores of items from the dictionary that would otherwise make sense in normal text?

DimitriPapadopoulos commented 2 months ago

By design, codespell is mostly case-insensitive and doesn't make a difference between assertIn and assertin.

CodyCBakerPhD commented 2 months ago

OK, so how would I define an ignore rule for assertin when scanning Python code?

DimitriPapadopoulos commented 2 months ago

There's the code dictionary. Because codespell is language agnostic, you cannot define a rule for Python files only.

CodyCBakerPhD commented 2 months ago

The code dictionary seems to have definitions of common misspellings of code, e.g.,

agrv->argv
...
arange->arrange, a range,

these make sense

The problem is that a simple pure Python built-in snippet like

import unittest

class MyTest(unittest.TestCase):
    def some_test(self):
        self.assertIn("a", ["b"])

makes codespell think that the code line is a misspelling with suggestions corresponding to the non-code dictionary: https://github.com/codespell-project/codespell/blob/master/codespell_lib/data/dictionary.txt#L5633

So are you saying that this is a broader issue that the non-code dictionary is running and making suggestions on code instead of just normal text?

DimitriPapadopoulos commented 2 months ago

Everything's text as far as codespell is concerned, whether Shakespeare prose or source code.

You just get to select the typo lists/dictionaries you want to use - the default ones or your own selection. The code dictionary contains typos that are invalid English words in a general context, but are valid words in the context of source code. You're supposed to avoid the code dictionary when spellchecking source code - or use it at the expense of more false positives in the hope of catching a few true positives.

peternewman commented 1 month ago

How about moving it to the code dictionary? Merge requests welcome.

I've opened #3469 for this specific aspect.

The code dictionary seems to have definitions of common misspellings of code, e.g.,

agrv->argv

This shouldn't be in there as it will likely rarely be triggered, if you're checking code you won't use this dictionary (but it might not pass CI in the normal one where it would work because argv isn't a word in a proper dictionary.

arange->arrange, a range,

This is a numpy function, so wants skipping if you're checking code (so it doesn't become arrange or a range): https://numpy.org/doc/stable/reference/generated/numpy.arange.html

DimitriPapadopoulos commented 1 month ago

Can be closed after moving assertin to code.

For agrv->argv and similar entries, moved from code to main dictionary in #3470.