BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.21k stars 106 forks source link

Parent directory appears out of order with `contents_first(true)` in certain circumstances #163

Open danielparks opened 1 year ago

danielparks commented 1 year ago

I’m not sure what’s going on here. I‘m happy to spend some time sorting this out, but I figured I’d report it first to see if it was obvious to you.

This is easier to understand by example than by explanation. Nevertheless, I’ll try to define it first.

Explanation

When:

  1. A certain directory structure is required. I’m not sure what the criteria are.
  2. Walkdir.new() is passed a symlink to the directory structure.
  3. contents_first(true)

and walkdir is ascending out of a certain subdirectory, it will return another entry in the parent directory before the subdirectory itself.

Example

use walkdir::WalkDir;

fn main() {
    let _ = std::fs::create_dir_all("dir/mdir/ndir");
    let _ = std::fs::File::create("dir/afile");
    let _ = std::os::unix::fs::symlink("dir", "link");

    walk("dir");
    println!("");
    walk("link");
}

fn walk(root: &str) {
    let mut last_depth: usize = 0;

    println!("walking {}", root);
    for entry in WalkDir::new(root).contents_first(true) {
        let entry = entry.unwrap();
        let depth = entry.depth();

        println!("{}", entry.path().display());
        if last_depth > depth && !entry.file_type().is_dir() {
            eprintln!("    Left directory but next entry was not directory itself");
        }

        last_depth = depth;
    }
}

Output:

walking dir
dir/mdir/ndir
dir/mdir
dir/afile
dir

walking link
link
link/mdir/ndir
link/afile
    Left directory but next entry was not directory itself
link/mdir

I would expect walking the symlink to either behave the same as walking the directory, or just refuse to traverse the symlink (since follow_links defaults to false).

danielparks commented 1 year ago

Actually, if you set follow_links(true) they both work. So probably the bug is that it follows the first symlink, which violates some assumption.

danielparks commented 1 year ago

Well, I have a fix, but I’m not quite sure why it works. The problem seems to be related to not adding the root symlink to deferred_dirs.

In other words, change https://github.com/BurntSushi/walkdir/blob/461f4a8708eed4ea6cf14ac6f775a197ec61eee8/src/lib.rs#L844-L846 to:

            if md.file_type().is_dir() {
                itry!(self.push(&dent));
                if self.opts.contents_first {
                    self.deferred_dirs.push(dent);
                    return None;
                }
            }

(I suspect that doesn’t match your style, so this is more to make things clear.)

BurntSushi commented 1 year ago

Thanks for the diagnosis and coming up with a fix!

Would you mind submitting a PR? And in particular, it would be good to come up with a comment explaining that branch and why it fixes the bug.

I've had a quick look at the code, but unfortunately the contents_first option really ties the code up in knots and it is difficult to follow. So I'm not able to re-contextualize quickly.

danielparks commented 1 year ago

Sure!

Yeah, definitely a little hard to follow. The price of flexibility, I suppose.

It will probably be a few days at least — I’d like a better understanding so I can be sure I'm fixing it right.

danielparks commented 1 year ago

Looks like there’s another bug when both same_file_system and contents_first are true and a subdirectory is a mount. I’ll fix that and write tests for it too.

I’m pretty sure I’m not breaking anything on Windows, but I don’t have recent Windows license to check (and even if I did I know nothing about reparse points beyond what I gleaned from https://github.com/rust-lang/rust/issues/46484).

danielparks commented 1 year ago

I’m pretty sure I’m wrong about there being a bug related to same_file_system. Still working on my understanding to be sure I get it right.