codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.91k stars 466 forks source link

Creating specific ignore lists for specific files? #2491

Open a-t-0 opened 2 years ago

a-t-0 commented 2 years ago

Context

Suppose one wants to ignore the word:"cips" in a file latex/references.bib, yet one would not want to ignore that word in other files.

Example

Normally, one could ignore spelling mistakes in file latex/references.bib with an ignore list of words in file: codespell/ignore_references_bib.txt using:

codespell -I codespell/ignore_references_bib.txt

However, this applies that list of ignored words, to all files.

Question

(How) is it possible to only apply a file with ignored words, to a specific file only?

bartlettroscoe commented 11 months ago

This is a feature we would really like to have. The current approach of only allowing a global ignore list has many disadvantages:

It seems it would be so easy to implement and support such a feature. A suggestion from @markcmiller86 in https://github.com/betterscientificsoftware/bssw.io/pull/1874 is ...

Since codespell operates on ascii files only, it just needs to look of a magic ascii phrase of some kind (unlikely to ever appear in any real ascii content) that can be used as a sort of meta-escape character so that when it is encountered, data following it (or perhaps bracketed by it, etc.) can then be passed to codespell. That doesn't rely on any knowledge of the particular ascii format in use. It just requires some ingenuity and creativity...

@codespell-ignore-words{...}

Then, anyone wanting to use that needs to find their own way to harmlessly embed it in their ascii files...

This is a more common feature for spell checkers and less intrusive than the inline ignore feature discussed in:

mdeweerd commented 9 months ago

I have different approaches to this issue:

  1. A feature suggestion
  2. A workaround

Feature suggestion

We currently can specify a file that has line that when matched exactly are excluded (-x option).

I suggest to add the possibility to add a yaml configuration file that allows to specify a regex expression for the file name and a regex for lines/groups of lines to exclude for matched files. To allow this configuration to evolve, I add a key

Example:

regex_excludes:
  - files: (?x)^(.*/file1.*php|.*/.*-group-.*)$
    regex: \b\$varname\b
    multiline_regex: \bcodespell:disable\b.*?\bcodespell:enable\b
  - files: (?x)^(.*/file2.*php|.*/.*-othergroup-.*)$
    regex: \b\$othervarname\b
    multiline_regex: \bcodespell:disable\b.*?\bcodespell:enable\b
 - files: \.cpp$
    multiline_regex: \bcodespell:disable\b.*?\bcodespell:enable\b

At least one key would be required (regex or multiline_regex). If 'files' is missing, the regex expression apply to all files.

I imagine the algorithm could be the following for the multine_regex:

  1. Execute codespell check as usual on the current file;
  2. If a file would be flagged, run the 'multiline_regex' on it. This would need to match the multiline_regex, count the number of LF in it, and replace it with a string containing that many LF.
  3. Run codespell check on the updated contents.

Workaround

I created a script that helps maintain the file of exceptions. It can be run once codespell only reports exceptions that you do not want to exclude in a different way.

https://gist.github.com/mdeweerd/edecd82d542b150859f65e6b73bdef79