BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.24k stars 107 forks source link

Add ability to sort the walked entries #6

Closed vandenoever closed 8 years ago

vandenoever commented 8 years ago

Imagine this use-case. One wants to do a scan of all folders and perform the same scan later and obtain the list of differences.

The scanning will produce two lists that are expensive to sort. It is more efficient to have WalkDir output a sorted list.

Currently WalkDir uses the iterator from fs::read_dir. The entries are used as is.

The proposed patch adds a boolean value sort that enables sorting of the entries. This matches nicely with the behaviour of WalkDir when max_open == 1: all entries of the opened dir are read instantly. in addition, they are sorted.

A simple benchmark on a directory with one million files (warm cache) shows:

   walkdir ~          2.0s
   walkdir --sort ~   2.8s
   walkdir ~ | sort  19.5s
BurntSushi commented 8 years ago

@vandenoever Sorry for the late reply, but thanks for the PR! So, I think something like this seems desirable, but I'm concerned about this particular implementation path. In particular, it doesn't expose any way for the caller to choose their own sort function---baking in lexicographic sort seems unfortunate if we're going to pursue this type of feature.

vandenoever commented 8 years ago

How about adding a sort function? Not sure what best form is for a sort function as a member.

sort: Option(FnMut(&T)),
...
pub fn sort(mut self, f: F) -> Self where F: FnMut(&T) -> Ordering {
    self.opts.sort = Some(sort);
vandenoever commented 8 years ago

sort() now has a sorting function as argument.

vandenoever commented 8 years ago

sort_by now takes a Option<FnMut>.

vandenoever commented 8 years ago

Travis fails because of a lazy_static. It's not used directly in walkdir.

src/lib.rs:73:1: 73:11 error: no_std is experimental
src/lib.rs:73 #![no_std]
BurntSushi commented 8 years ago

I think I have just one more nit and then I'd be ready to merge! Thanks so much for sticking with me. :-)

Don't worry about Travis. It looks like lazy_static! doesn't build on Rust 1.3 any more. I'll fix that later.

vandenoever commented 8 years ago

Sure thing. Thanks for the great suggestions. The result is better for it.

vandenoever commented 8 years ago

Shall I coalesce the commits?

BurntSushi commented 8 years ago

@vandenoever Yes please! A rebase would be great.

vandenoever commented 8 years ago

Done.

BurntSushi commented 8 years ago

Fantastic, thank you!

BurntSushi commented 8 years ago

So I actually ended up hitting a bug with this. To reproduce:

$ cargo build --release --example walkdir
$ ./target/release/examples/walkdir --fd-max 1 --sort ./

Tweaking the value of --fd-max changes how the error occurs. It looks like a already closed directory iterator is trying to be closed again. It might be the case that the panic was put there because I was just being really paranoid and might be able to just be removed, but I need to think more carefully about it.

(I'm happy to fix this, but just can't do it right now.)