Closed darioackermann closed 6 months ago
Hi there 👋,
Thanks for reaching out. Can you elaborate a bit more on the issue you are experiencing? I appreciate that you dove directly into troubleshooting, but I am struggling to understand where that journey started.
Best Joe
@codingjoe I stumbled across a similar issue today while attempting to run relint **
locally - to make it clear, I'm not affected by the rich
issue, but relint was also not matching any files. As pointed out by @darioackermann, currently the following code snippet is used when running relint
:
As can be clearly seen, glob.escape
is called on every file before being passed to glob.iglob
. I do not understand the reason for using glob.escape
though, as this will make it impossible to actually have any form of globbing when specifying files for relint.
The README says that relint **
will scan all files recursively, but this is actually not true and depends on the operating system and shell that has been used, as relint itself does not do any kind of globbing. In other words:
:x: - Case 1: When literally calling relint **
outside of a shell context, relint will have an argv[1]
of **
, escape it to [*][*]
and pass that to glob.iglob
which will look for a literal file called **
, ending up doing nothing
:x: - Case 2: When calling relint **
in a shell which does not support double-asterisk globbing or has said option disabled (e.g. bash
when not setting globstar
), the shell will either pass a literal **
(= same result as case 1) or simply act as if *
was passed, meaning that the shell will do a non-recursive lookup and execute relint like relint a.py b.py c.py ...
. relint will check some files, but only those at the current directory level.
✅ - Case 3: When calling relint **
in a shell which does support double-asterisk globbing, all files will be recursively looked up and relint will be called like relint a/a.py a/b.py a/c.py b/test1.py b/test2.py ...
. Here relint will do what **
was supposed to do, as it is now explicitly passed all the files to check.
This is IMHO rather unintuitive and not what I'd expect from a linter, as now the behaviour can differ depending on which context the linter is being called in. I would expect this behaviour from a linter:
relint .
should recursively lint all files starting in the current directory (the preferred way, should be in the docs)relint a.py b.py
should just scan these two files which have been passedrelint some/folder
should recursively lint all files in some/folder
relint crazy/pat?ern/*.py
should use recursive globbing to lookup filesrelint **
, while a bit unusual, will continue to work and is now even recursive on "unsupported" shellsReplacing the code snippet further above with this one would implement exactly said functionality - this could obviously be split up into a helper function, but for brevity's sake, I made a single expression:
paths = {
file
for path in args.paths
for file in (
glob.iglob(os.path.join(path, '**'), recursive=True)
if os.path.isdir(path)
else glob.iglob(path, recursive=True)
)
}
Please note that I also renamed args.files
to args.paths
, as the latter is a more fitting expression for specifying such paths. Now calling relint
is no longer dependent on the specific operating system / shell that is used, and supports specifying files, directories or glob patterns.
Last but not least, I noticed that running relint on a huge project wastes a lot of time, as lint_file
will always attempt to open a file, then check if the file pattern actually matches. This is very inefficient and I would propose this code instead:
def lint_file(filename, tests):
for test in tests:
if test.file_pattern.match(filename):
break
else:
return
try:
with open(filename) as fs:
content = fs.read()
except (IsADirectoryError, UnicodeDecodeError):
pass
else:
for test in tests:
for match in test.pattern.finditer(content):
line_number = match.string[: match.start()].count("\n") + 1
yield filename, test, match, line_number
This provides a noticeable speedup on my system when running across a repository which also contains node_modules
and similar directories with many small files. The original implementation usually takes around ~12 seconds to complete, whereas the improved version only takes around ~2.5 seconds.
Could you maybe check this out and, if you agree, implement these or similar changes to relint? It's a great tool, especially in combination with pre-commit, but the usability is a bit tricky with the current behaviour.
@codingjoe One more thing, somewhat related to this - the pre-commit hook is also semi-broken, as --git-diff
is always passed to relint based on the current hook config.
Given that pre-commit
itself will always pass the files which should be linted, either those which are about to be committed or all of them in batches, in case --all-files
was specified, I would propose removing the args: [--git-diff]
option. This is a much safer approach, as then relint is guaranteed to run on the same files that pre-commit expects, and there is no need for relint
to duplicate the logic when being called from pre-commit
.
As of today though, running pre-commit run --all-files
simply does nothing in a project, as the --git-diff
argument passed to relint causes it to simply do nothing instead of linting as its supposed to. Simply removing this one line from the hook config would restore proper functionality in all cases.
Thanks for the detailed response and the deep-dive. Honestly, I don't recall why the pattern is being escaped. It does defeat the purpose. That being said, since it never worked, I'd favor simply removing tree walking all together and relying on users to pass files names. IMHO, it's best to keep the scope of this package small and on point.
@ppmathis you make a point about pre-commit, but I'd encourage you to open a separate discussion for that.
Resolved in v 3.1.1
Resolved in v 3.1.1
OS: Windows Python: 3.10
Setting up as described in the Readme leads to
relint **
parsing absolutely no files.glob.escape
leads toglob.iglob
returning with an empty set... if os.path.isfile(path)
i.e., modified main looks like this nowStill, when culprits are found, throws
AttributeError: type object 'Syntax' has no attribute 'guess_lexer'
It looks like there has been a major change torich
, and this package is not locking a major version of it.