codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.86k stars 471 forks source link

support ignore for blocks of code #3381

Open 12rambau opened 6 months ago

12rambau commented 6 months ago

This is the follow-up issue of #1212. It is now possible to ignore a single line from a file by adding # codspell: ignore comment. Would it be possible to continue to mimick what black is doing and implement an on/off comment.

It's easier to understand with a real life example:

in my lib I need to gather some data from USDA wich associate a US state to a 2 letter code. I thus store a boring dictionnary at the start of my module:

# fmt: off
ASSET_CODES = {
    "Alabama": "AL", "Arkansas": "AR", "Arizona": "AZ", "California": "CA", "Colorado": "CO",
    "Connecticut": "CT", "Delaware": "DE", "Georgia": "GA", "Florida": "FL", "Iowa": "IA", "Idaho": "ID",
    "Illinois": "IL", "Indiana": "IN", "Kansas": "KS", "Kentucky": "KY", "Louisiana": "LA", "Massachusetts": "MA",
    "Maryland": "MD", "Maine": "ME", "Michigan": "MI", "Minnesota": "MN", "Missouri": "MO", "Mississippi": "MS",
    "Montana": "MT", "Nebraska": "NE", "New Hampshire": "NH", "New Jersey": "NJ", "New Mexico": "NM",
    "Nevada": "NV", "New York": "NY", "North Carolina": "NC", "North Dakota": "ND", "Ohio": "OH", "Oklahoma": "OK", # codespell: ignore
    "Oregon": "OR", "Pennsylvania": "PA", "Rhode Island": "RI", "South Carolina": "SC", "South Dakota": "SD",
    "Tennessee": "TN", "Texas": "TX", "Utah": "UT", "Vermont": "VT", "Virginia": "VA", "Washington": "WA",
    "West Virginia": "WV", "Wisconsin": "WI", "Wyoming": "WY",
}
# fmt: on

When I run the code spell pre-commit, it falls on "ND" which is the code for Nevada and offers me to replace it with "AND" or "2ND". The codespell ignore comments is perfectly doing it's job but I lack visibility. If you look carefully at my example, the comments #fmt: off and #fmt: on are deactivating black for the whole length of the dict. Would it be possible to implement such a comment for codespell ?

julian-smith-artifex-com commented 2 months ago

I have recently created a PR that adds support for ignoring regions that span multiple lines: #3476.

It works by allowing the user to specify a regex that is applied using re.DOTALL before codespell's line-based algorithm is used.

So the begin/end tags are not hard coded, instead one sets them on the command line.

For example with: codespell --ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end'

one can ignore code blocks with:

# codespell:ignore-begin
... codespell will not look at this text.
# codespell:ignore-end

or

// codespell:ignore-begin
... codespell will not look at this text.
// codespell:ignore-end