git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.28k stars 2.51k forks source link

Recursive gitignore using ** does not work as expected from an intermediate directory #3587

Closed uecasm closed 2 years ago

uecasm commented 2 years ago

Setup

Tested in git version 2.33.1.windows.1, git version 2.34.0.windows.1, and finally:

git version 2.34.1.windows.1
cpu: x86_64
built from commit: 2ca94ab318509b3c271e82889938816bad76dfea
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
Microsoft Windows [Version 10.0.19043.1348]
(c) Microsoft Corporation. All rights reserved.
Editor Option: Nano
Custom Editor Path:
Default Branch Option:
Path Option: Cmd
SSH Option: OpenSSH
Tortoise Option: false
CURL Option: OpenSSL
CRLF Option: CRLFCommitAsIs
Bash Terminal Option: MinTTY
Git Pull Behavior Option: Rebase
Use Credential Manager: Enabled
Performance Tweaks FSCache: Enabled
Enable Symlinks: Enabled
Enable Pseudo Console Support: Disabled
Enable FSMonitor: Disabled

No.

Details

Occurs in both Git Bash and TortoiseGit.

git init /tmp/test
cd /tmp/test
mkdir -p a/b/c/devl/test
echo 'devl/**' > a/b/.gitignore
echo '!devl/*.dat' >> a/b/.gitignore
touch a/b/c/one.txt
touch a/b/c/devl/two.txt
touch a/b/c/devl/three.dat
touch a/b/c/devl/test/four.txt
touch a/b/c/devl/test/five.dat
git add a
git status
Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   a/b/.gitignore
        new file:   a/b/c/devl/three.dat
        new file:   a/b/c/one.txt
Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   a/b/.gitignore
        new file:   a/b/c/devl/test/five.dat
        new file:   a/b/c/devl/test/four.txt
        new file:   a/b/c/devl/three.dat
        new file:   a/b/c/devl/two.txt
        new file:   a/b/c/one.txt

In particular, the files inside devl should be ignored (with the exception of the .dat file directly within devl) but instead it is not ignoring anything.

Not needed, complete repro above.

If the prefix /**/ (or just **/, but that seems less correct) is added before the .gitignore rules then this behaves as expected. However requiring that prefix is not expected since the rule should be applied recursively below that directory where no prefix is specified. (Essentially, lacking a / prefix should be exactly equivalent to having a /**/ prefix, according to documentation. But that is not the case here.)

vassudanagunta commented 2 years ago

The observed behavior is entirely consistent with .gitignore rules:

  • If there is a separator at the beginning or middle (or both) of the pattern, then the pattern is relative to the directory level of the particular .gitignore file itself. Otherwise the pattern may also match at any level below the .gitignore level.

  • For example, a pattern doc/frotz/ matches doc/frotz directory, but not a/doc/frotz directory; however frotz/ matches frotz and a/frotz that is a directory (all paths are relative from the .gitignore file).

Specifically, the devl/** entry within a/b/.gitignore has no impact since nothing in the directory tree matches a/b/dev1/**.

To achieve the desired results, a/b/.gitignore should instead have the following

 **/devl/**
!**/dev1/*.dat

The reporter's understanding as quoted below is wrong.

Additional note:

If the prefix // (or just /, but that seems less correct) is added before the .gitignore rules then this behaves as expected. However requiring that prefix is not expected since the rule should be applied recursively below that directory where no prefix is specified. (Essentially, lacking a / prefix should be exactly equivalent to having a /**/ prefix, according to documentation. But that is not the case here.)

uecasm commented 2 years ago

That doesn't make sense. A filename-only pattern is applied as-is in the same directory and in every subdirectory. The same should be true for a pattern that includes a directory but does not have a leading / to make it non-recursive. Otherwise the distinction of leading / or not in patterns that contain a directory is completely pointless.

Still, I overlooked the "or middle" part of the doc, so mea culpa on that, although I still think it's silly.

vassudanagunta commented 2 years ago

It makes sense to me just as it no doubt made sense to Linus Torvalds: relative paths are always relative to the current context, just as they are in Unix, just as they are in HTTP requests. What .gitignore does is make one special and sensical exception: *If it is a name pattern, as opposed to a path pattern, then it matches all filenames and directory names regardless of directory (from here down).

If you think about what typical people need to do, it's a smart set of rules:

The emphasized "the" in the last two are intentional: people don't generally want to accidentally match files in unexpected places. So path pattens should be relative to the current dir, not all dirs.

uecasm commented 2 years ago

There was already a syntax for "I want to match only files/dirs in this directory, not subdirectories", which is to prefix the pattern with /.

As such, this would have made sense:

See how all of the above are a completely consistent pattern all the way across the board? No special exceptions.

But with the rules as they are, /test/**/src/ and test/**/src/ behave exactly the same, and you instead have to have a leading **/ to indicate you want recursive descent, both of which break the pattern.

vassudanagunta commented 2 years ago

There was already a syntax for "I want to match only files/dirs in this directory, not subdirectories", which is to prefix the pattern with /.

Absolutely wrong. A leading / has semantics that are well understood and deeply internalized by nearly 100% of programmers: "absolute path", starting from a fixed root. As I pointed out in my preceding reply, the well understood way to express what you describe above is a path without a leading slash.

🚩 *If you are going to dispute this basic truth, don't bother reading further. There is nothing more I can say to you, except perhaps by daring you to show this thread to reputable programmers and see if they agree with you and choice (D) below.

99.9% of people looking at a .gitignore with the patterns in your examples will not read them with the semantics you give them. Your semantics goes against everyone's now intuitive built-in understanding of absolute and relative paths.

Given established path semantics, we face the following design choice:

(A) is out of the question. You chose (D). Linus Torvalds chose (C). I would have chosen (B) or (C).

It was the next design question and choice made that is the cause of confusion: The most common thing to do in a .gitignore is to ignore ALL files of a particular extension or pattern wherever they exist in the tree, e.g. all `.tmpfiles. Can we make this as easy as possible with a simple.tmp` pattern?

The choices were:

The best choice for semantic consistency would have been (B) + (NO). The best choice for both simplicity and terseness for most .gitignore files at the expense of some semantic consistency is (C) + (YES).

uecasm commented 2 years ago

Absolutely wrong. A leading / has semantics that are well understood and deeply internalized by nearly 100% of programmers: "absolute path", starting from a fixed root. As I pointed out in my preceding reply, the well understood way to express what you describe above is a path without a leading slash.

No, you are incorrect. While that's true outside of gitignore, it has always had that meaning inside of gitignore. In fact the only way to tell gitignore to "ignore just this test directory" is with /test/ (or /test, if you want it to ignore either a file or directory) -- which looks like an absolute path, but isn't. There's no such thing as an absolute path in gitignore.

This was perhaps also a design mistake (due to the confusion with absolute paths), but it predated support for ** and nested paths, so by the time the syntax was extended we were stuck with leading slashes. I remain unconvinced that introducing more inconsistency with the new syntax has improved things.

vassudanagunta commented 2 years ago

Your claim contradicts that /src/*.tmp has always meant the same thing as src/*.tmp. Your claim also contradicts that src/*.tmp has always been a relative path to the dir containing the .gitignore, not all directories lower in the tree as you claim it should be.

The only explanation is what I described as option (C).

You remain unconvinced. That's your prerogative. I'm glad it was Torvalds not you that made the call. What have you created that's been adopted by so many people?

uecasm commented 2 years ago

No, the explanation is that they just adopted the semantics of fnmatch for implementation convenience, so whatever it did was automatically "correct". It was not a consistent design.

Regarding the latter paragraph, this is the only response it deserves.

vassudanagunta commented 2 years ago

The logical fallacy is yours. I did not use an appeal to authority to make a logical claim, so it is logically fallacious to counter what I said with claims of logical fallaciousness.

I have great respect for Torvalds's work and the things he accomplished -- the proof is in the pudding. I have little respect for people who's primary attitude is one of arrogance and disrespect. In fact, I'm not a fan of Torvald's arrogance and his frequent unkindness. But at least his arrogance is backed by solid judgement and analysis and good works. And I sympathize with him as well as much of his harshness is because he has to deal with people like you.

I'm not going to bother addressing your claim that your "design" is consistent. Anyway, I'm done. I know I'm not the first in your life who has told you you are arrogant and mean, and I know I won't be the last.

PhilipOakley commented 2 years ago

@vassudanagunta @uecasm A better question / collaboration would be to look at how the Git documentation can be improved to avoid the many instances of folks being confused/mistaken/surprised by the semantic.

As you will have seen it can be hard to correct mistaken impressions or remember how it was before you knew, but if we can improve the docs / man page then it would greatly reduce friction.

One issue I see with the docs is that they tend to take the positive assertion line in explaining actions and fail to include negative 'stopping' assertions (often on the basis that they are 'generalisations'). This leaves those who are on the wrong track to double-down on their misunderstanding. I'm not so sure that letting the docs abdicate to the fnmatch is a great explanation. If we can create a better clarification then that would help many.

derrickstolee commented 2 years ago

The root cause here is likely a bug that was introduced in Git 2.34.0. Here is the mailing list discussion with a fix in this message.

If you follow that thread, you'll find that the logic around the ** operator and the directory matching expectations is quite complicated, and the documentation does not do a good job of explaining it. Despite my best attempts, all I can say is "Whatever Git did before was correct" and "It has something to do with how unpack_trees() walks directories".

It looks like this bug was fixed in time for 2.34.1, so if you are using Git for Windows 2.34.1, then the behavior should be the same as 2.33.1.

dscho commented 2 years ago

@vassudanagunta @uecasm could I ask you to please keep in mind that we're all learning here, and help each other to learn? There are human beings on all ends of this conversation, after all. I would hope that we can demonstrate the same respect that we would like to be awarded in return. Thank you.

vassudanagunta commented 2 years ago

@PhilipOakley I would be happy to help or attempt to help to improve the documentation. I'm using stock git, not git-for-windows. Can I assume that the current behavior is correct? I guess I'll have to figure out were open issues are tracked.

@derrickstolee I'm not sure that bug explains why @uecasm's expectation that a/b/.gitignore with:

devl/**

should ignore a/b/c/devl/ and everything below it. He claims that devl/**, because it does not have a leading /, "should be applied recursively below that directory", which he means all of the following should be excluded:

a/b/devl/
a/b/c/devl/
a/b/c/d/devl/
a/b/something-else/devl/

In essence, he believes that devl/** should work as **/devl/** does.

vassudanagunta commented 2 years ago

@dscho I agree, and I do try, I honestly do. But I came to this issue after reading @uecasm's post on StackOverflow and witnessed his dismissive and mean attitude toward all the people who were trying to help him, and his categorical assertion that the .gitignore rules are "silly", without the slightest respect or deference to its creators, that there might be other perspectives.

I should work on my breathing exercises.

uecasm commented 2 years ago

Inconsistent rules are silly. Sometimes silly things are unavoidable due to backwards compatibility, which is definitely the case here. That has never been a disparagement of the creators, it happens to everyone. (It's also why there's a new JavaScript framework every couple of years -- people get annoyed enough at the accumulation of backwards compatibility hacks and inconsistency to burn it all down and start over, before inevitably suffering the same fate themselves.)

In essence, he believes that devl/** should work as **/devl/** does.

Yes, that was true. I have already accepted that it's unlikely that this can be "fixed" without the use of a time machine, and that it is indeed currently working as documented, which is why I closed the issue. I disagree with @vassudanagunta's assertion that I have been rude at any point -- I have been dismissive of some "answers", including his, on StackOverflow, but that is because they were not answering the question actually asked (and just reposting things already in the question as if they answered anything), so the dismissiveness is entirely justified. I think he's taking it too personally. But whatever, I don't think there's any point in continuing discussion on that topic either.

Regarding my specific overlooking of that part of the docs, I don't think that is the fault of the docs as such; I had just internalised the idea that "leading slash means current directory only, no leading slash means recursive descent" from the use of GUI tools that automatically update the gitignore file that way, and so was not expecting the rule to change when there is also an interior slash. I did read the docs to try to confirm but I'm not sure whether I perhaps read an older version of the doc or if I just glossed over that part; the brain is excellent at skipping details that it thinks it already knows, even when wrong.

dscho commented 2 years ago

@vassudanagunta honestly, when I read your scathing judgement, I don't find your behavior any better than the behavior of @uecasm that you criticize. So, both of you, can you please aim for a more civil tone? What I had to read in this ticket was no joy, and that was not due to the technical issues that were discussed, but exclusively due to these unnecessary personal attacks. We're not here to attack each other. We're here to learn, and to improve Git for Windows. And doing this in a kinder, more respectful way just makes everything much easier. Including the life of the maintainer who has to read these conversations.

vassudanagunta commented 2 years ago

to be fair, @dscho, my "scathing judgement" resulted from a feeling similar to what you feel now. I volunteer on StackOverflow, as do many others, purely to help other people, for no benefit other than the joy of helping. I don't do it for my resume. I quit the tech industry to work with children, to write about social inequity and injustice issues and the cultural reasons that underly them, reasons which include the elevation of meanness, entitlement and selfishness masquerading as respecting intelligence, individualism and freedom. On the flip side, when it comes to calling out these issues, politeness has priority over truth. Progress on racism, sexism and so many other social ills continue at a snail's pace because otherwise good, decent people stay silent or just go with the flow, choosing not to make their friends and coworkers uncomfortable, or not to risk one's social standing in the world's privileged classes, by speaking out whenever behavior is unfair, mean or unjust. Is a "civil tone" truly civil when the result is an uncivil world?

Am I ultra-sensitive to these issues? Yes. Because I feel all our social ills stem from the smallest things we do, not just blatant crimes or injustices.

Is my approach the right way to change the tone of the world? I don't know. Maybe not. Just keep in mind that while the software industry, both commercial and open source, may make many technical innovations, its track record on social justice, empathy and social good is not great.

Anyway, I won't say anymore. If you read to this sentence, thank you for that.

dscho commented 2 years ago

@vassudanagunta thank you for sharing your background with us. As to what you wrote: it could have been kinder and more respectful. And I hope that you will improve that in the future.