fsfe / reuse-tool

reuse is a tool for compliance with the REUSE recommendations.
https://reuse.software
371 stars 141 forks source link

When running reuse via pre-commit, ignore untracked files (or: create `lint-file`) #578

Open vHanda opened 1 year ago

vHanda commented 1 year ago

Hello.

I sometimes have some random files which I haven't committed and don't necessarily plan to, but I don't want to add them to the '.gitignore'. Reuse complains about those files.

It would be awesome if I could run reuse with a --ignore-untracked option, which ignores all files which aren't in git index.


comment by @carmenbianca :

Other linters, pylint etc, when run via pre-commit, only run against staged/tracked files. This is a behaviour of pre-commit, which passes (I think) a list of files to the linters. reuse differs here, because there's presently no way to run the linter against a subset of files.

Let's create lint-file to lint a subset of files, and then allow the pre-commit to be configured to use that instead of lint.

carmenbianca commented 1 year ago

Tempted to mark this wontfix as feature creep. pre-commit can handle this for you. When you commit some files using pre-commit, it quickly stashes the untracked files, runs the pre-commit stuff, then un-stashes the untracked files.

ryandesign commented 1 year ago

Fine, then please update the documentation to provide the complete set of steps needed to accomplish this. Assume I didn't know anything about pre-commit before discovering the reuse tool.

ryandesign commented 1 year ago

To clarify: I have set up pre-commit to the best of my ability following the directions currently in the reuse documentation. Pre-commit runs the reuse tool when I commit anything. It prevents any commit from going through if there are any files in the directory that don't have proper license information, even if those files are untracked and are not part of the commit.

carmenbianca commented 1 year ago

Ah, you may be right. pre-commit stashes and unstashes unstaged tracked files.

We'll need to look at this issue some time then. I'm inclined towards a solution that doesn't introduce git-specific interactions into the tool. I think the solution might be to only lint the files tracked by pre-commit. There's an issue open for only linting specified files, but I'm on my phone and can't easily find it.

Wuestengecko commented 1 year ago

@carmenbianca

I'm inclined towards a solution that doesn't introduce git-specific interactions into the tool.

It has those already:

https://github.com/fsfe/reuse-tool/blob/61a2d74b71aec46f7cf00d9c4b89ae87b7b65aad/src/reuse/vcs.py#L78-L98

Looking at the ls-files invocation there and its man page, I found this interesting note (emphasis mine):

-i, --ignored

Show only ignored files in the output. When showing files in the index, print only those matched by an exclude pattern. When showing "other" files, show only those matched by an exclude pattern.\ Standard ignore rules are not automatically activated, therefore at least one of the --exclude* options is required.

This issue is about exactly this set of files: Not tracked by git (yet), but also not on any ignore list.

Unfortunately, simply dropping --ignore entirely would result in all intentionally .gitignore-d files to not be listed. So the options as far as I can see are:

I presume the latter might become a memory issue on very large repos, but then again those probably have a lot of ignored files laying around as well anyways.

nicorikken commented 1 year ago

For reference I think @carmenbianca was referring to issue #512

Wuestengecko commented 1 year ago

Hi! Is there any news about this issue? I had opened a PR which resolves this in #626, but there has been no movement on it for over a month now.

mxmehl commented 1 year ago

We just had out first maintainers call this year and discussed the issue in length.

First of all, let me say that we all feel the pain by linters vs. untracked files that you actually never want to track. So thank you for making us think about it and even proposing a PR! But we are convinced that somehow ignoring it would lead to more issues that it solves.

  1. We checked some other linters, e.g. black and pylint, and they all lint untracked files. It's somehow an expected behaviour, even if it isn't ideal.
  2. We think it's confusing if reuse lint may return different output after git add --all. After all, we cannot guess which untracked files the user actually would like to add later, and which ones are just test script or temporary artifacts.
  3. We discussed adding this as optional behaviour via a flag to lint but decided against it to keep this subcommand as simple as it currently is.
  4. If we implemented this feature, we also would have to add the same magic to the Mercurial function. Certainly doable, but not desirable as it's sometimes not trivial to align fine-grained details in both VCS.
  5. There are other ways to fix this experience.

Ad 5., here is how we would make one's life easier with this specific issue:

  1. You could stash the test files, run lint, and unstash again. Similar to how pre-commit does it.
  2. You could make such temporary test files be ignored on your machine. For this, create a ~/.gitignore in which you define a custom pattern of files which you would like to ignore, e.g. *.myignorepattern* matching files like test.myignorepattern.sh. Then, make your git read this gitignore file via git config --global core.excludesfile '~/.gitignore'

All of this said, we'd like to close #626 as well.

Wuestengecko commented 1 year ago

I fully understand the reasoning behind this decision.

As for fixing the problem, I went with option 3: A wrapper script around the git hook that asks whether I want to ignore the errors and continue anyway. This has served me well both with REUSE complaints as well as several other occasional false-positives. (It's as simple as ${0%/*}/pre-commit.real "$@" || read -q '?Continue anyway? ' in zsh.)

carmenbianca commented 1 year ago

@Wuestengecko Interesting! Normally when I encounter an issue like this with failing lints/tests because of unstaged files, I run git commit as normal, and subsequently pre-commit complains. I analyse why pre-commit failed, and if I deem it to be because of the unstaged files, I run git commit --no-verify, which skips pre-commit entirely.

(And if I use --no-verify too eagerly, the CI catches my mistakes, usually :) )

vHanda commented 1 year ago

I really disagree with the conclusions.

Just cause some other linters behave this way, that doesn't mean reuse should. The key difference between typical "linters" and reuse is that reuse applies to all files. So, while you can easily have other random files lying around, and not have that bother other linters, with reuse that isn't doable.

Regarding 5 -

a. Stashing does not affect untracked files. So doing a stash doesn't do anything. b. The whole point of having untracked files is that you want to see those files in the 'git status' and not have them ignored.

None of your alternatives really work.

Regardless, you've made your decision. My complaints aren't going to change your mind.

Please remember, the more difficult a tool is to use, the lower the chance that someone is going to adopt it or evangelize it.

Wuestengecko commented 1 year ago

Normally when I encounter an issue like this with failing lints/tests because of unstaged files, I run git commit as normal, and subsequently pre-commit complains. I analyse why pre-commit failed, and if I deem it to be because of the unstaged files, I run git commit --no-verify, which skips pre-commit entirely.

@carmenbianca I used to do it like this too, but having to remember to add -n the second time became annoying. Now I don't have to change the command anymore ;)

It's also convenient when I have a bunch of relatively unrelated changes in the work tree and want to split them into two or three different commits. If I see it's just the untracked files again for the second commit, and everything else was successful, I don't have to re-run at all.

And if I do mess up, I can rely on the CI as well. :)

Stashing does not affect untracked files.

@vHanda Use git stash -u

vHanda commented 1 year ago

@vHanda Use git stash -u

I need to apologize for the slightly annoyed message.

I didn't realize that git stash -u was an option. Now I'm sincerely humbled and appreciate that you all all took the time to address this so thoughtfully. I, now, also completely understand the thought process.

Again, my apologies for the slightly hostile message, and thank you for working on this.

ryandesign commented 1 year ago

Well, I don't completely understand yet, and still don't know how to accomplish what I want.

What I want is: I commit changes to my git repository. The pre-commit hook runs reuse to verify that what's being committed complies. Files that are not part of the commit (or, at least, files that have never been committed) are not checked and noncompliance of those files is irrelevant and does not block the commit from going through.

You speak of git stash -u. Am I supposed to include that instruction in the pre-commit setup somewhere? Where? As I said above, please update the documentation to provide the complete set of steps needed to accomplish this.

carmenbianca commented 1 year ago

What I want is: I commit changes to my git repository. The pre-commit hook runs reuse to verify that what's being committed complies. Files that are not part of the commit (or, at least, files that have never been committed) are not checked and noncompliance of those files is irrelevant and does not block the commit from going through.

Ah. Let's re-open this. There's a subtle detail here that I think we overlooked when closing this.

Other linters, pylint etc, when run via pre-commit, only run against staged/tracked files. This is a behaviour of pre-commit, which passes (I think) a list of files to the linters. reuse differs here, because there's presently no way to run the linter against a subset of files.

I think the solution lies not in ignoring untracked files; it lies in enabling the prior behaviour. I think there's an issue for this, but I can't find it. I'll rename this issue.

stephanlachnit commented 6 months ago

I'm quite interested in this since I've started using pre-commit as well.

I agree that I don't think having a different behavior for reuse lint that depends on the state of git is something that makes sense. However, what is the problem with a lint-file subcommand or lint --only option? This should be easy to implement. I think a --only subcommand makes more sense since a lint-file indicates that this is something common.

I even think the opposite, --exclude also makes sense, e.g. when you add packaging files on top and want to check the source. But this might be something for a different issue.

cdwilson commented 4 months ago

Use git stash -u

I tried git stash -u but it results in any staged files being removed from the git index (since the index is included in the stash). Since this command is likely being run when the reuse pre-commit hook fails, there are always files in the index when it is run, so removing those staged files from the index is not the desired behavior.

I found that I needed to do git stash -u --keep-index instead which keeps any staged files in the index. However, since the staged files are also included in the stash, after making the commit, I need to run git stash pop --no-index so that git restores any modified/untracked files, but does not try to restore the staged files to the index (since we already just committed them).

While this works, it's annoying. Is there a simpler way to accomplish the same thing?

What I want is: I commit changes to my git repository. The pre-commit hook runs reuse to verify that what's being committed complies. Files that are not part of the commit (or, at least, files that have never been committed) are not checked and noncompliance of those files is irrelevant and does not block the commit from going through.

FWIW, this is exactly the behavior I want (and expected) too. I just want any modified/untracked files to be ignored by the pre-commit hook, and only staged files in the git index to be checked for compliance.

Wuestengecko commented 4 months ago

While this works, it's annoying. Is there a simpler way to accomplish the same thing?

Well, there is already an open PR for this, which you could just use ;) I do that in some repos, so I can confirm that it actually works.

Just change your .pre-commit-config.yaml entry to:

  - repo: https://github.com/Wuestengecko/reuse-tool
    rev: 7d19244ab46b6a125d1a4dbe4a3ebb9ca767ab1a
    hooks:
      - id: reuse

Do note that you'll have to revert back to the official repo once the PR (or whatever other solution) gets merged, as I'll eventually delete my fork then.