Closed EliahKagan closed 2 months ago
I've edited the description to note that the code I quoted is not necessarily where a change has to be made, since Ignored
taking precedence over Pruned
does make sense outside of specific cases.
With us being in different time zones, it would have been useful had I contributed some part of the tests or fix between the initial report and now. Unfortunately I was quite distracted and, while I was able to make a bit of progress on other gitoxide-related things, I didn't really come up with anything for this.
Thanks a lot for the incredibly detailed analysis of the problem.
I also feel that ideally, a solution would make such kind of mistakes impossible, but I am also not sure this can happen. More will be known once I dive into the code.
In addition to the above, the effect of
-x
and-d
on actual nested repositories, especially if they are to continue to delete them under some conditions even in the absence of-r
, should be documented explicitly, including in the output ofgix help clean
. But that could be done separately from the fix for this bug.
I'd hope you will be able to submit a PR with the documentation improvements you would like to see - I cannot imagine anyone better suited to make them.
Besides that, I hope that the default messaging around the removal of ignored directories works:
❯ cargo run --bin gix -- -r /Users/byron/dev/github.com/nabijaczleweli/cargo-update clean -xd
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
Running `target/debug/gix -r /Users/byron/dev/github.com/nabijaczleweli/cargo-update clean -xd`
WOULD remove / (🗑️)
WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
Even though the example is taken from the buggy version that is unaware this is the root, it does warn that ignored repositories would be removed. Overall, I like the way it guides the user towards adding flags to change the set of would-be-cleaned files.
Regarding /
, I think I should add tests around that to see what happens - it's nothing I really considered which makes it likely to exhibit surprising behaviour. In practice as demonstrated here, it seems to be another way of matching all with *
though.
Regarding the !<file>
not having any effect, I think the real issue here is that it doesn't care about these at all even though it would probably identify them as tracked. The directory walk probably disables listing these, so it won't see them at all. This leaves the issue at incorrectly classifying the root-directory as something that can be deleted. Additional tests around ignoring .git
should probably also be added for good measure.
The key seems to be that the top-level directory is taken to match the entry
*
, causing its contents all to be deleted even if some of them match!
exclusions.
Agreed.
In addition, combined with the above results, this raises the question of whether
gix clean -xde
without-r
is actually intended to delete any actually nested repositories:
The intended behaviour is that it allows itself to remove ignored git repositories which are contained in ignored directories. That's easy to do as it may not even look deeper unless the --skip-hidden-repositories
option is provided.
The root cause is probably that it's possible to declare the top-level directory as ignored, which allows contained repositories to be removed as well.
❯ cargo run --bin gix -- -r /Users/byron/dev/github.com/nabijaczleweli/cargo-update clean -xd --skip-hidden-repositories all -r
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.65s
Running `target/debug/gix -r /Users/byron/dev/github.com/nabijaczleweli/cargo-update clean -xd --skip-hidden-repositories all -r`
WOULD remove repository .git/ (🗑️)
This invocation shows the behaviour quite nicely, as it now found the 'hidden' repository itself, and flagged it for deletion as requested. Of course, that shouldn't be possible here. It's notable that this also won't be possible for submodules as these should be identified as 'tracked', but the root of the repository is never tracked.
So I'd think this is a very 'local' bug as it really can only apply to the root of the repository, or so I'd think. Of course, I encourage you to prove that wrong.
In addition to the above, the effect of
-x
and-d
on actual nested repositories, especially if they are to continue to delete them under some conditions even in the absence of-r
, should be documented explicitly, including in the output ofgix help clean
. But that could be done separately from the fix for this bug.I'd hope you will be able to submit a PR with the documentation improvements you would like to see - I cannot imagine anyone better suited to make them.
Besides that, I hope that the default messaging around the removal of ignored directories works:
❯ cargo run --bin gix -- -r /Users/byron/dev/github.com/nabijaczleweli/cargo-update clean -xd Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s Running `target/debug/gix -r /Users/byron/dev/github.com/nabijaczleweli/cargo-update clean -xd` WOULD remove / (🗑️) WARNING: would remove repositories hidden inside ignored directories - use --skip-hidden-repositories to skip
Even though the example is taken from the buggy version that is unaware this is the root, it does warn that ignored repositories would be removed. Overall, I like the way it guides the user towards adding flags to change the set of would-be-cleaned files.
Yes. Assuming it is intended that -r
not always be required to allow nested repositories to be removed--which I think is confirmed by your subsequent description of its interaction with --skip-hidden-repositories
--that output is good. But then the way -r
/--repositories
is documented, at least in the output of gix help clean
(i.e. the documentation here), is misleading:
-r, --repositories
Remove nested repositories
This makes it seem not only that -r
causes nested repositories to be removed, which is true, but that, without -r
, nested repositories are not removed.
I'll think about how to improve that wording and open a pull request to do so.
Regarding
/
, I think I should add tests around that to see what happens - it's nothing I really considered which makes it likely to exhibit surprising behaviour. In practice as demonstrated here, it seems to be another way of matching all with*
though.
As far as how gitoxide
currently treats /
as a full path, I think so.
But git
does not itself treat /
as a full path the same way. For example, in the above experiments, listing /
did not cause git
to regard .gitignore
as ignored--when making the /
test repository, I didn't have to have !.gitignore
, or any other entry appended after /
, to successfully stage .gitignore
by running git add .
. Repeating them now confirms this, and also files with names like a
are not ignored by git
due to /
in .gitignore
. For git
, I am not sure /
as a full entry in .gitignore
has any effect at all.
Regarding the
!<file>
not having any effect, I think the real issue here is that it doesn't care about these at all even though it would probably identify them as tracked. The directory walk probably disables listing these, so it won't see them at all. This leaves the issue at incorrectly classifying the root-directory as something that can be deleted. Additional tests around ignoring.git
should probably also be added for good measure.
That makes sense, because if I ignore foo
and unignore bar
then foo/bar
is still ignored due to being in the foo
directory.
So I'd think this is a very 'local' bug as it really can only apply to the root of the repository, or so I'd think. Of course, I encourage you to prove that wrong.
I would not consider this to disprove that, but I have found that when the current repository is a submodule with a .git
file and its own .gitignore
lists .git
, deleting the .git
file in a submodule does not require -r
.
That pertains to "Case 5" in the Expected behavior section of the issue description. I had not tried it at that time. I have tried it out now, by doing a recursive clone of has-submodule and experimenting with .gitignore
.
It's notable that this also won't be possible for submodules as these should be identified as 'tracked'
Yes, I do think case 6 is already not a problem, and the experiments I've just done seem to confirm this.
I'll think about how to improve that wording and open a pull request to do so.
Thank you 🙏
But
git
does not itself treat/
as a full path the same way. For example, in the above experiments, listing/
did not causegit
to regard.gitignore
as ignored--when making the/
test repository, I didn't have to have!.gitignore
, or any other entry appended after/
, to successfully stage.gitignore
by runninggit add .
. Repeating them now confirms this, and also files with names likea
are not ignored bygit
due to/
in.gitignore
. Forgit
, I am not sure/
as a full entry in.gitignore
has any effect at all.
Thanks for researching this. I will add this to the baseline tests (comparing Git results with gitoxide
) so that it's clear these match. Once again, I think this case is currently missing.
That makes sense, because if I ignore
foo
and unignorebar
thenfoo/bar
is still ignored due to being in thefoo
directory.
I mean, this is how gix clean
does it - it only looks one dir-entry at a time and asks for whether it's igonred or precious. And I don't know what would happen if /foo\n!bar
is used in .gitignore
and someone tries to add foo/bar
. Probably foo/bar
would then be allowed, at least it should be as per the rules. In case you are interested, this should be verifiable with gix exclude query
.
I would not consider this to disprove that, but I have found that when the current repository is a submodule with a
.git
file and its own.gitignore
lists.git
, deleting the.git
file in a submodule does not require-r
.
Thank you, that sounds wrong, too, and I will take a look.
To clarify the part about the .git
file, the repository is the submodule, and I mean that deleting the repository's own .git
file does not require -r
. (I was not trying to describe a situation with nested submodules.)
This is likely what you took me to mean, but I just realized that was extremely ambiguous, so I figured I'd clarify just in case.
I'll think about how to improve that wording and open a pull request to do so.
Thank you 🙏
No problem! I'm holding off for a bit in doing so, since #1464 might affect what ought to be said.
Current behavior 😯
gix clean -xde
will delete the entire top-level repository it is operating on, including tracked files and the.git
directory itself--thus the whole local history--if a.gitignore
file contains*
or/
. This seems to happen because the repository itself is identified as an untracked nested repository.Illustration with a real-world example
One approach to writing
.gitignore
is to list*
followed by!
exclusions. In such a repository, runninggix clean -xde
deletes the entire contents of the repository directory, causing inconvenience and the loss of any unpushed data. For example, on thecargo-update
repository:That is on Ubuntu 22.04 LTS.
Windows is affected when outside the directory
Although Windows superficially appears unaffected because open files and directories cannot usually be deleted, the entire repository directory will still be deleted if one runs
gix -r cargo-update clean -xde
from the parent directory. (This-r
is agix
option and should not be confused withgix clean -r
.) So really all systems are affected, though to different degrees.The
/
means the repositoryThe
/
is in each case referring to the top-level directory of the repository. It fortunately does not refer to the actual root of the filesystem. Likewise:..
components in.gitignore
do not cause files outside of the repository to be deleted..gitignore
patterns that cause only some of the contents of.git
to be deleted. Although this may seem like cold comfort, it is actually a major mitigating factor, because partial deletion of.git
would not necessarily be noticed and could result in situations that could be much harder to recover from, since local refs could be silently removed, recreated with different referents, and then force-pushed to a remote without awareness that they were replacing preexisting conceptually unrelated tags or branches.However, all local branches and their history (except the tip of any branches checked out in separate worktrees and unmodified), the index, any stashes, and other objects available through the reflog, can all be deleted.
Some relevant code
It looks like, in traversals, paths that are not conceptually part of the repository are pruned, except that when a
.git
directory would be pruned for this reason but also matches a.gitignore
entry, then it is instead retained and given theIgnored
status:https://github.com/Byron/gitoxide/blob/15f1cf76764221d14afa66d03a6528b19b9c30c9/gix-dir/src/entry.rs#L43-L49
But this is not necessarily to say that this aspect of the classification has to be changed. It may be suitable for most repositories found during traversal, just not for the top-level working tree or
.git
directory, nor for any submodules or their.git
files.Expected behavior 🤔
One never expects cleaning to remove tracked files or cause data loss in the repository's local history, much less loss of the whole history.
Although the behavior described here was not intended, and exposition may not be required to demonstrate that this is a bug, I've nonetheless detailed what I think is the expected behavior below. This is because I think it may be useful identifying current or future cases of the bug, some of which are less obvious than others.
A
*
entry in a.gitignore
has real-world uses (as shown in the example of thecargo-update
repository) and should be taken as a pattern that matches all files, which subsequent!
exclusions can then make exceptions to. A/
might likewise be used deliberately, though I'm not sure I have seen it outside of testing. Furthermore, the ability to delete nested untracked repositories is a deliberate feature ofgix clean
when passed some combinations of options, and a very useful feature at that. However:gix -r ... clean -xde
. This would apply even if the directory were actually empty due to a nonstandard worktree. (This-r
is agix
option and should not be confused with-r
as agix clean
option as presented below in case 4.).git
..git
directory nor anything inside it should ever be deleted in a clean, even if intentionally cleaning ignored files and even if intentionally including ignored subdirectories that are themselves repositories. Although the current repository's.git
directory is "ignored" in the sense that commands likegit add .
do not stage anything from it, this behavior is separate from, and supersedes, the effect of.gitignore
. The.git
directory is effectively a void in the working tree..git
is specified explicitly in.gitignore
and-r
is included in the options passed togix clean
, the.git
directory should not be deleted. Currently adding-r
will make this happen, covered in "Steps to reproduce" below. (This-r
is agix clean
option and should not be confused with-r
as agix
option as presented above in case 1.).git
file instead of a.git
directory. This likewise should not be deleted when runninggix clean
in the submodule, irrespective of the options togix clean
or the contents of any.gitignore
file. I have not tested this case yet..gitmodules
--those submodules should not be deleted bygix clean
, since submodule directories are likewise voids in the superproject's working tree, rather than being ignored due to.gitignore
. I have not tested this case yet either.Cases 2 and 3 are the most important, because they are the most common and they are the most likely to cause data loss, especially case 3.
Although I have not tested cases 5 and 6 (yet), I mention them because it seems like incorrect behavior for them might be easy to bring about by accident when fixing this bug for the other cases.
Notes on
-p
and-r
One possible implementation approach comes to mind for a fix that I want to recommend against. While
gix clean
recognizes precious files, I recommend against allowing any of the above to occur even when-p
is passed. I think regarding.git
as typically ineligible for deletion by automatically considering it a precious directory would still be far from strong enough protection. It also doesn't really fit conceptually: I believe that precious files are conceptually those that should usually not be deleted due to their status as being important for reasons independent of source control.In addition to the above, the effect of
-x
and-d
on actual nested repositories, especially if they are to continue to delete them under some conditions even in the absence of-r
, should be documented explicitly, including in the output ofgix help clean
. But that could be done separately from the fix for this bug.Git behavior
No one-to-one comparison...
There is no exact comparison to
git clean
behavior, becausegix clean
deliberately deletes ignored nested repositories when-x
and-d
are passed (provided that-e
is passed to allow it do anything at all). It furthermore seems to do so intentionally even without-r
, though as examined below in Steps to reproduce, perhaps this is unintentional in the absence of-r
.More broadly,
gix clean
is not intended to behave exactly the same asgit clean
, as detailed in #1308....but
git
behavior is relevantHowever, it's true that
git clean
does set strong expectations for what kinds of deletions are within the ambit of cleaning, and no way of usinggit clean
produces this effect.For example, running
git clean -xdf
in thecargo-update
repository used as an example above does not delete any tracked files or the.git
directory.Steps to reproduce 🕹
To reproduce this, one can carry out the example shown above with the non-toy
cargo-update
repository, which confirms the practical significance.Simplified reproducer
One can alternatively run the following commands to reproduce it with a simple repository. These and all commands shown in this section were tested in Ubuntu 22.04 LTS.
With full output:
Demonstration that this relates to nested repository handling
Some of the above commands are not necessary to confirm the bug but illustrate relevant aspects of it. For example, consider this part of the output of the dry-run
gix clean -xdn
command run before doing the real clean:This strongly suggests that the problem relates to the way code in
gix::dirwalk
identifies nested repositories.Variation with
/
The above commands can be repeated with a
/
entry instead of*
to confirm that it also happens with that. This works both with and without the second line,!.gitignore
, sincegit
itself treats a/
entry in.gitignore
not to cover.gitignore
. Actually I am not sure what a/
in.gitignore
is supposed to do.Variation with tracked files that are not special
Although the real-world example with
cargo-update
, as well as the minimal example where.gitignore
lists*
and!.gitingore
, illustrate that files excluded from being ignored by matching a!
pattern that comes after*
are not spared, here's a minimal example focused on that:That produces:
The key seems to be that the top-level directory is taken to match the entry
*
, causing its contents all to be deleted even if some of them match!
exclusions.Variation listing
.git
and passing-r
Making the
.gitignore
file contain.git
does not causegix clean -xde
to delete.git
, but it does causegix clean -xdre
to delete.git
. (Note that this-r
is an option togix clean
and should not be confused with the-r
option togix
before a subcommand, which specifies a repository to operate on.) This can be seen by running the commands:Here's what that looks like:
When should
-r
affect what happens?I believe this should not happen even with
-r
.In addition, combined with the above results, this raises the question of whether
gix clean -xde
without-r
is actually intended to delete any actually nested repositories:gix clean -xde
without-r
is currently a good command to delete both build output and generated archives when run ingitoxide
's own repository--might be considered part of this bug, or its own separate related bug.Other variations with
.git
and-r
Listing
/.git
has the same effect as.git
at least when the current directory is the top-level directory of the repository.Listing paths under
.git
in.gitignore
, such as with the line.git/config
or/.git/config
, fortunately has no effect. It does not appear that a.gitignore
entry can causegix clean
with any combination of options to attempt to delete only some files inside.git
.