Open cipriancraciun opened 2 years ago
I unfortunately can't make sense of your first paragraph. You mention the symlink is not recursed into, but also talk about skip_current_dir... So I'm not sure what you mean.
I think examples will help, or else I'll have to repro it when I get time. (Which might be a while.)
To be clear, I'm unsure what you're actually calling inconsistent.
I'll try to summarize the issue:
walkdir
with follow_links (true)
and same_file_system (true)
; (plus sorting by name, although I don't think this is essential;)r
, which contains:a
that points to a folder outside the same file-system r
resides on;b
that points to a folder within the same file-system r
resides on;a
and b
point to are not empty;c
;When iterating, walkdir
first returns a
which .file_type().is_dir()
(based on the .metadata()
of the pointed-to folder according to follow_links (true)
), then it returns b
which also is .file_type().is_dir()
, then it returns children from b
, then it returns c
. As expected it skips over a
's children because it's not on the same file-system although follow_symlinks (true)
was set.
However, if one tries to asses both a
and b
based only on the .metadata()
returned by walkdir::DirEntry
, one doesn't see any difference between them, both are .path_is_symlink()
and both are folders according to .file_type().is_dir()
. Thus if (somewhere down the line) some filter code decides that one of these folders should be not recursed-into and calls .skip_current_dir()
the behaviour differs in the two cases:
.skip_current_dir()
just after obtaining a
from .next()
, walkdir
stops recursing the parent of a
, namely r
;.skip_current_dir()
just after obtaining b
from .next()
, walkdir
stops recursing b
, thus returning other children from r
;The problem is that walkdir
treats both a
and b
as symlinks to folders (thus the .metadata()
is that of the pointed to folder) when it comes for DirEntry
purposes, but then it treats them differently when it comes to recursion and the behaviour of .skip_current_dir()
.
(For example I had tool that uses walkdir
and it worked just fine by ignoring two folders that happened to be symlinks to folders on the same file-system, but then when one of the folders was moved on a different file-system, the tool started to misbehave by stopping the iteration of the parent folder.)
My proposal was to fix this corner-case by either making .skip_current_dir()
take into account this particularity, or better, by making the DirEntry
that represents a
in our case be something akin to a "broken symlink", i.e. the .metadata()
should return the lstat
of the symlink.
To summarize: when it comes to symlinks, .same_file_system (true)
should take priority over .follow_links (true)
, and the code should behave (only when presented with a symlink pointing to another file-system and same_file_system (true)
) just as if walkdir
was configured with .follow_links (false)
.
For example in handle_entry
one should also check if .opts.same_file_system
was set and in that case only self.follow(dent)
only if the symlink points inside the same file-system as root.
https://github.com/BurntSushi/walkdir/blob/master/src/lib.rs#L821-L823
Thank you for that thorough explanation! That makes things much clearer.
by making the
DirEntry
that representsa
in our case be something akin to a "broken symlink", i.e. the.metadata()
should return thelstat
of the symlink.
So I do like this answer, but I also worry that it might be problematic to represent a non-broken symlink as if it were a broken symlink. It's difficult to come up with concrete problems here, but I am worried. On the other hand, if one thinks of "same file system" as a conceptual model where anything outside the file system the root directory is on is considered to basically not exist, then I think it does make sense...
Otherwise, I think it seems like the least worst option. I'd prefer not to add any new public APIs to fix this for example, because it's not reasonable to expect folks to deeply reason about this and correctly use some new API to work around it. The other option, making skip_current_dir
aware of this peculiarity, also seems less clean to me.
PRs for this are welcome, as I'm not sure when I'll have time to get around to this myself.
PRs for this are welcome, as I'm not sure when I'll have time to get around to this myself.
So, you would accept a PR which implements the "broken symlink" workaround as described above?
(I'm not promissing much, but if I get some time I'll try to implement it. At the moment I've made a workaround in my code that doesn't rely on .skip_current_dir()
.)
I'm not 100% certain I would take it, but I'm pretty sure. I can't really think of a better path here.
On Sat, Aug 20, 2022, 13:52 Ciprian Dorin Craciun @.***> wrote:
PRs for this are welcome, as I'm not sure when I'll have time to get around to this myself.
So, you would accept a PR which implements the "broken symlink" workaround as described above?
(I'm not promissing much, but if I get some time I'll try to implement it. At the moment I've made a workaround in my code that doesn't rely on .skip_current_dir().)
— Reply to this email directly, view it on GitHub https://github.com/BurntSushi/walkdir/issues/166#issuecomment-1221376188, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADPPYSMHXWPXFYZTUUO6XDV2ELMTANCNFSM57C7Y4WA . You are receiving this because you commented.Message ID: @.***>
When one uses
follow_links (true)
but alsosame_file_system (true)
, andwalkdir
encounters a symlink to a folder on a different file-system (which will thus not be recursed-into), it will report it as a file or folder (assuming the pointed-to exists), and thus callingskip_current_dir
will in fact skip iterating the parent folder of that symlink.I don't know if this is actually an issue within
walkdir
itself, or the code that useswalkdir
in such a manner.Perhaps the best solution is to add a new method
DirEntry::is_recursed
that states if the current entry is or not actually the subject of recursion, and thus if callingskip_current_dir
will affect this entry and not its parent.Another solution, perhaps best if viewed from a security point-of-view, is to treat symlinks to entities outside the current file-system as broken links, thus the
DirEntry::file_type
andDirEntry::metadata
would actually belong to the symlink and not to the pointed-to file or folder.