BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.22k stars 107 forks source link

WalkDir sorter should accept whole dir entries instead of just file names #70

Closed jchlapinski closed 7 years ago

jchlapinski commented 7 years ago

Currently it is not easy to sort entries alphabetically with directories first (which is most often the order in which we see files in tools), since sorter for WalkDir::sort_by() only provides file names. In this PR I have changed the code, so that the sorter accepts references to entries being sorted. Example usage:

extern crate walkdir;

fn main() {
    use std::cmp::Ordering;
    use walkdir::WalkDir;

    // Paths sorted alphabetically with directories first
    for e in WalkDir::new("./").sort_by(|a, b| match b.file_type().is_dir().cmp(&a.file_type().is_dir()) {
        Ordering::Equal => a.file_name().cmp(b.file_name()),
        o @ _ => o,
    }).into_iter().filter_map(|e| e.ok()) {
        println!("{}", e.path().display());
    }
}

If this change would to be accepted, we have to also fix the documentation accordingly. Regards

BurntSushi commented 7 years ago

Nice! Have you seen the discussion in #53? If not, then I think we independently converged on the same API. :-) I'm laughing at how much of a stink I put up about this change being tricky, but it appears to have been much easier than I anticipated!

So yes, I do think we want this change. Would you like to write the docs? If not, I can do it in a later PR.

jchlapinski commented 7 years ago

@BurntSushi Sure, I will look trough the code once again before weekend and fix docs wherever needed.

BurntSushi commented 7 years ago

@jchlapinski Awesome, thank you!

jchlapinski commented 7 years ago

Nice! Have you seen the discussion in #53?

@BurntSushi hmm, indeed I did not notice that this was being done here. However I find it absolutely awkward to use tuple or introduce another public type just to ameliorate sorting. My feeling is that what I have done is not really that different to what was already there - those DirEntries are produced slightly earlier, however `fs::DirEntry::file_type() is being called early anyway to check whether the entry is a dir, a file or a symlink, so there is no additional penalty in system calls in my change, no? Or maybe I am not seeing something. I did not test this with large number of files.

BurntSushi commented 7 years ago

@jchlapinski Right, I don't think there's any performance change here. I think we are in sync; we both agree that a new public type or anything other than passing a pair of &DirEntry would be really unfortunate. The key downside here is that if you call metadata on a DirEntry, then it will always hit the file system. When sorting, the results of a single DirEntry could therefore change, which would lead to unfortunate things. However, I think we basically concluded in #53 that this was an acceptable downside, and that if really necessary, the caller could introduce their own metadata cache.

jchlapinski commented 7 years ago

@BurntSushi OK, I have looked trough the code and fixed docs and removed a few empty lines.

BurntSushi commented 7 years ago

@jchlapinski Awesome work! Thank you!