BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.21k stars 106 forks source link

`filter_entry` misbehaves when `contents_first` is enabled #171

Open selendym opened 1 year ago

selendym commented 1 year ago

Hello.

It seems that when contents_first is enabled, filter_entry misbehaves when the deferred directories are filtered out.

Example:

fn main() {
    let _ = std::fs::create_dir_all("/tmp/walkdir/foo");
    let _ = std::fs::create_dir_all("/tmp/walkdir/bar");
    let _ = std::fs::File::create("/tmp/walkdir/foo/file");
    let _ = std::fs::File::create("/tmp/walkdir/bar/file");

    println!("Without filter_entry:");
    for entry in walkdir::WalkDir::new("/tmp/walkdir")
        .contents_first(true)
        .into_iter()
    {
        println!("  {entry:?}");
    }

    println!("With filter_entry:");
    for entry in walkdir::WalkDir::new("/tmp/walkdir")
        .contents_first(true)
        .into_iter()
        .filter_entry(|entry| !entry.file_type().is_dir())
    {
        println!("  {entry:?}");
    }
}

Output:

Without filter_entry:
  Ok(DirEntry("/tmp/walkdir/bar/file"))
  Ok(DirEntry("/tmp/walkdir/bar"))
  Ok(DirEntry("/tmp/walkdir/foo/file"))
  Ok(DirEntry("/tmp/walkdir/foo"))
  Ok(DirEntry("/tmp/walkdir"))
With filter_entry:
  Ok(DirEntry("/tmp/walkdir/bar/file"))

Expected output for the latter part:

With filter_entry:
  Ok(DirEntry("/tmp/walkdir/bar/file"))
  Ok(DirEntry("/tmp/walkdir/foo/file"))

This seems to be caused by double popping IntoIter.stack_list:

As a fix, a check for contents_first in https://github.com/BurntSushi/walkdir/blob/master/src/lib.rs#L1051 could perhaps work. This would not fix direct usage of skip_current_dir, however.

Best regards.

kartonrad commented 1 year ago

This also happens when setting .follow_symlinks(false), while filtering out hidden entries, and then encountering a hidden symlink on windows

filter_entries tries to delete the entry twice, wich leads to the iterator exiting the current folder without looking at any other entries

kartonrad commented 1 year ago

just ran into this by chance

kenchou commented 7 months ago

I encountered a similar issue.

test case:

mkdir -p /tmp/test-walkdir/{a,b,c}
for entry in walkdir::WalkDir::new("/tmp/test-walkdir")
    .contents_first(true)
    .sort_by(|a, b| a.file_name().cmp(&b.file_name()))
    .filter_entry(|e| e.file_name().to_string_lossy() != "a")
    .into_iter()
{
    println!("{}", entry.path().display());
}

output:

/tmp/test-walkdir

missing items b, c

output after change filter to .filter_entry(|e| e.file_name().to_string_lossy() != "b")

/tmp/test-walkdir/a
/tmp/test-walkdir

missing item c

It seems to have broken the iterator after encountering a filtered entry.

wtachau commented 2 weeks ago

+1, also ran into an issue with filter_entry and contents_first that looks a lot like this. the problem goes away when disabling contents_first