codespell-project / codespell

check code for common misspellings
GNU General Public License v2.0
1.84k stars 470 forks source link

Behavior of --skip with folder names #1915

Open peternewman opened 3 years ago

peternewman commented 3 years ago

Also these should all work the same:

We also need to check if --skip ./foo is equivalent to --skip ./foobar (on a file/folder named foobar) and likewise for --skip foo.

Originally posted by @peternewman in https://github.com/codespell-project/codespell/issues/1528#issuecomment-639940013

DimitriPapadopoulos commented 3 years ago

In my case, none of these work:

--skip ./.git/
--skip ./.git
--skip .git/
--skip .git
--skip=./.git/
--skip=./.git
--skip=.git/
--skip=.git
--skip ./tests/ci/checkpatch/
--skip ./tests/ci/checkpatch
--skip tests/ci/checkpatch/
--skip tests/ci/checkpatch
--skip=./tests/ci/checkpatch/
--skip=./tests/ci/checkpatch
--skip=tests/ci/checkpatch/
--skip=tests/ci/checkpatch
DimitriPapadopoulos commented 2 years ago

codespell -S ./.git attempts the following fnmatch calls and eventually succeeds:

>>> fnmatch.fnmatch(".git", "./.git")
False
>>> fnmatch.fnmatch("./.git", "./.git")
True
>>> 

codespell -S ./.git/ attempts the following fnmatch calls, all of them fail:

>>> fnmatch.fnmatch(".git", "./.git/")
False
>>> fnmatch.fnmatch("./.git", "./.git/")
False
>>> 

I guess the key is in the fnmatch documentation:

Note that the filename separator ('/' on Unix) is not special to this module. See module glob for pathname expansion (glob uses filter() to match pathname segments).

DimitriPapadopoulos commented 2 years ago

With the following setup:

$ tree
.
└── tests
    └── ci
        └── checkpatch
            └── spelling.txt

3 directories, 1 file
$ 
  1. First in my list, codespell -S tests/ci/checkpatch attempts among other the following fnmatch call, which fails:
    >>> fnmatch.fnmatch("./tests/ci/checkpatch", "tests/ci/checkpatch")
    False
    >>> 
  2. Same for codespell -S tests/ci/checkpatch/:
    >>> fnmatch.fnmatch("./tests/ci/checkpatch", "tests/ci/checkpatch/")
    False
    >>> 
  3. Same for codespell -S ./tests/ci/checkpatch/:
    >>> fnmatch.fnmatch("./tests/ci/checkpatch", "./tests/ci/checkpatch/")
    False
    >>> 
  4. On the other hand, codespell -S ./tests/ci/checkpatch does succeed after all:
    >>> fnmatch.fnmatch("./tests/ci/checkpatch", "./tests/ci/checkpatch")
    True
    >>> 

    I don't know why I thought it didn't in #2047 - I was probably tired and confused by too many attempts to understand.

So we just need to:

  1. take care of the leading ./: either add it to both the pattern and path under test, when missing, or remove it from both the pattern and path under test, if present,
  2. remove any trailing / from the pattern.
DimitriPapadopoulos commented 2 years ago

The trailing / is easy to fix.

About the leading ./, here are a few things I have noticed:

Clearly, something's wrong here. We want to match relative paths - relative to the directory passed as an argument to codespell. Instead, codespell prepends the directory to the file path, or ./ if no argument is given. This is a nuisance because it is counterintuitive, because the -skip argument needs to be adapted (--skip ./foo with plain codespell and --skip /current/dir/foo with codespell /current/dir), and because the special case of the current directory ./ feels completely wrong.

akien-mga commented 1 year ago

For the record, --skip also seems to be broken when used together with explicit paths:

$ codespell --skip ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:7: Retruns ==> Returns

My use case is running codespell via GitHub Actions only on files changed in a PR instead of the whole repository (so passing an explicit list of changed files, using the method in https://github.com/codespell-project/actions-codespell/issues/65), but still using a skip list to ignore thirdparty files, various specific files with false positive typos, and .git (which codespell seems to consistently fail at ignoring despite its claims).

DimitriPapadopoulos commented 1 year ago

You're doing the exact opposite of what you want, you are actually instructing codespell to check ./thirdparty/README.md instead of skipping it:

$ mkdir thirdparty
$ 
$ cat > thirdparty/README.md 
Retruns
$ 
$ codespell -S thirdparty
$ codespell -S ./thirdparty
$ codespell -S ./thirdparty/README.md
$ codespell -S ./thirdparty,./thirdparty/README.md 
$ codespell -S ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:1: Retruns ==> Returns
$ 
akien-mga commented 1 year ago

I think you misunderstood my example. I want codespell to ignore all files in the thirdparty folder, even if some or all of them get manually listed as paths to check. In other word, I want the skip list to take precedence over the paths to check list.

It works fine with individual files on the skip list, but not with folders:

$ codespell -S thirdparty/README.md thirdparty/README.md 
$ codespell -S ./thirdparty/README.md ./thirdparty/README.md 
$ codespell -S ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:8: Retruns ==> Returns

There's still this issue of mismatch between paths with or without the ./ prefix (so codespell -S ./thirdparty/README.md thirdparty/README.md doesn't actually skip the file as it doesn't match).

You may ask why I'm specifying the file manually in the list of paths to check if I also want to skip it, but that was just to have a simple example.

My real use case is as described above. In script form:

files=$(gh pr diff ${{ github.event.pull_request.number }} --name-only | xargs -I {} sh -c 'echo "./{}"' | tr '\n' ' ')
codespell -w -q 3 -S "./.git,./bin,./thirdparty,*.desktop,*.gen.*,*.po,*.pot,*.rc,./AUTHORS.md,./COPYRIGHT.txt,./DONORS.md,./core/input/gamecontrollerdb.txt,./core/string/locales.h,./editor/project_converter_3_to_4.cpp,./misc/scripts/codespell.sh,./platform/android/java/lib/src/com,./platform/web/node_modules,./platform/web/package-lock.json" -L "curvelinear,doubleclick,expct,findn,gird,hel,inout,lod,nd,numer,ot,te,vai" --builtin "clear,rare,en-GB_to_en-US" "$files"

Or shorter with what matters:

files=$(gh pr diff ${{ github.event.pull_request.number }} --name-only | xargs -I {} sh -c 'echo "./{}"' | tr '\n' ' ')
codespell -S "./thirdparty" "$files"

"$files is computed and may include files which might the pattern of the skip list. When the skip list includes folders, codespell doesn't match them properly.

(And the xargs -I {} sh -c 'echo "./{}"' | tr '\n' ' ' part is to workaround the weird ./ prefix issue.)

DimitriPapadopoulos commented 1 year ago

I think you misunderstood me, and I still think you do the exact opposite of what you would want.

Have a look at my examples: ---skip requires a comma-separated list of files and folders.

akien-mga commented 1 year ago

I'm fully aware, and I know what I'm doing. I think the workaround I had to do in https://github.com/godotengine/godot/pull/76842 for this bug makes my use case very explicit. The non pseudo code example I gave here makes it very clear that I know how to use --skip.

In case it's not clear:

usage: codespell [-h] [--version] [-d] [-c] [-w] [-D DICTIONARY] [--builtin BUILTIN-LIST] [--ignore-regex IGNORE_REGEX] [-I FILE] [-L WORDS] [--uri-ignore-words-list WORDS] [-r REGEX] [--uri-regex URI_REGEX]
                 [-s] [--count] [-S SKIP] [-x FILE] [-i INTERACTIVE] [-q QUIET_LEVEL] [-e] [-f] [-H] [-A LINES] [-B LINES] [-C LINES] [--config CONFIG] [--toml TOML]
                 [files ...]

I'm making use of this optional [files ...] argument to specify which files I want codespell to check. Some of those may or may not match patterns in the skip list, and I don't want codespell to check them if they do. It works with files, not with folders. That's all.

DimitriPapadopoulos commented 1 year ago

In the example you have sent, it may be expected that codespell does not skip ./thirdparty/README.md because you pass it as an explicit argument:

$ codespell -S thirdparty/README.md thirdparty/README.md 
$ codespell -S ./thirdparty/README.md ./thirdparty/README.md 
$ codespell -S ./thirdparty ./thirdparty/README.md 
./thirdparty/README.md:8: Retruns ==> Returns

This is a corner case, where at the same time you ask codespell to check a file and to skip it:

codespell -S <filename> <filename>

Should the --skip option win or the explicit file argument?

In any case, the whole skipping business needs to be overhauled, a large part of the broken use cases involve ./path vs. path.

This works:

$ codespell -S ./thirdparty
$ 
akien-mga commented 1 year ago

The three examples you quoted show clearly that what I'm expecting (skip takes precedence over explicit files listing when there's a conflict) works fine for files, and not for folders in the skip list.

But I leave it at that, we won't understand each other evidently.

DimitriPapadopoulos commented 1 year ago

Let's start over. These examples actually raise a precedence question. Explicit file argument or skip list? I think the skip list should win, and that is what usually happens. However, in this case, we do seem to have a precedence issue with folders:

$ mkdir thirdparty
$ 
$ cat > thirdparty/README.md 
Retruns
$ 
$ 
$ 
$ codespell -S ./thirdparty ./thirdparty/README.md
./thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S thirdparty ./thirdparty/README.md
./thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S ./thirdparty thirdparty/README.md
thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S thirdparty thirdparty/README.md
thirdparty/README.md:1: Retruns ==> Returns
$ 
$ 
$ 
$ codespell -S ./thirdparty/README.md ./thirdparty/README.md
$ 
$ codespell -S thirdparty/README.md ./thirdparty/README.md
./thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S ./thirdparty/README.md thirdparty/README.md
thirdparty/README.md:1: Retruns ==> Returns
$ 
$ codespell -S thirdparty/README.md thirdparty/README.md
$ 
$ 
$  
$ codespell -S ./thirdparty
$ 
$ codespell -S thirdparty
$ 

But then, I feel this a different issue, not the usual ./path vs. path issue, not an issue with folder names. You should probably open a new issue.