BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.24k stars 107 forks source link

"File name too long" error at 4096 bytes #23

Open vandenoever opened 7 years ago

vandenoever commented 7 years ago

WalkDir cannot handle long paths that find handles fine.

extern crate walkdir;
use std::fs::create_dir;
use std::env::{current_dir, set_current_dir};

fn main() {
    let dir = current_dir().unwrap();
    let name = "a";
    for i in 0..2200 {
        if i % 100 == 0 {
            println!("Create dir at level {}.", i);
        }
        current_dir().unwrap(); // this line shows that rust can handle it
        create_dir(name).unwrap();
        set_current_dir(name).unwrap();
    }

    for r in walkdir::WalkDir::new(dir) {
        let entry = r.unwrap(); // this gives an error for long paths
        let len = entry.path().to_string_lossy().len();
        if len > 4090 {
            println!("{}", len);
        }
    }
}
...
Create dir at level 2100.
4091
4093
4095
4097
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { depth: 2042, inner: Io { path: Some("/home/walkdir/a/a/.../a/a/a"), err: Error { repr: Os { code: 36, message: "File name too long" } } } }', src/libcore/result.rs:788
note: Run with `RUST_BACKTRACE=1` for a backtrace.
BurntSushi commented 7 years ago

I don't think this is a bug with walkdir. You'll probably want to reproduce this with a normal File::open call and file a bug on rust-lang/rust.

vandenoever commented 7 years ago

You're right. std::fs::read_dir is the problem.

vandenoever commented 7 years ago

More information has come up. This is a limitation of the operating system. On Linux, you can find out the maximal length for a relative path with getconf PATH_MAX / on the command-line or with the POSIX function pathconf("/", _PC_PATH_MAX).

find works around this by using setcwd.

BurntSushi commented 7 years ago

I'm not sure walkdir should be changing the current working directory automatically. Keep in mind that this is a library, not a command line tool like find. Consider what would happen if multiple walkdir iterators were running in parallel.

vandenoever commented 7 years ago

Doing a setcwd has overhead and it should only be used to 'sandwich' an opendir call if it is necessary to do so. A safe version of read_dir does not need a lot of overhead.

fn read_deep_dir<P: AsRef<Path>>(path: P) -> Result<ReadDir> {
    let len = path.as_os_str().len();
    if (len >= max_path) {
        let dir = current_dir()?;
        get_close_to(path);
        let r = read_dir(path);
        set_current_dir(dir)?;
        r
    } else {
        read_dir(path)
    }
}

The path length limitation is unfortunate. Changing the working directory, even briefly, is risky.

strace tells me that nftw does not use setcwd but can handle long paths fine. It manages this by using openat:

int openat(int fd, const char *path, int oflag, ...);

openat uses a file descriptor of the parent directory.

https://doc.rust-lang.org/libc/x86_64-unknown-linux-gnu/libc/fn.openat.html

BurntSushi commented 7 years ago

@vandenoever I understand the overhead can probably be minimized. That's not really my concern. My concern is that library shouldn't be changing process global state. Consider what happens when there are multiple walkdir iterators running in parallel in the same process. Bad bad things will happen.

openat seems like a better path.

vandenoever commented 7 years ago

Agreed. It'll require a fork of read_dir. Since a too long path is so rare, read_deep_dir could simply check for the errno (36) and fall back to the version with openat.

BurntSushi commented 7 years ago

Yeah, that is unfortunate, but shouldn't be too bad.

Forking might actually be a blessing in disguise. Last time I looked (which was a while ago), I had the suspicion that I could remove an allocation or two...

BurntSushi commented 6 years ago

I did a little digging on this today, and to my amazement, fts in GNU's libc will actually execute a chdir operation.

tavianator commented 5 years ago

I did a little digging on this today, and to my amazement, fts in GNU's libc will actually execute a chdir operation.

Yep, unless you pass FTS_NOCHDIR :)

untitaker commented 2 years ago

Relevant discussion: