BurntSushi / walkdir

Rust library for walking directories recursively.
The Unlicense
1.3k stars 109 forks source link

Make `WalkDir::new` Take an `Into<PathBuf>` #186

Closed ultrabear closed 1 year ago

ultrabear commented 1 year ago

The obvious benefit of replacing AsRef<Path> with Into<PathBuf> is the internals currently only immediately convert to a PathBuf. By allowing Into<PathBuf> we can still allow most expected types but skip allocating in the case where the user has an owned String or PathBuf etc to pass in.

This change is very obviously breaking, and it does leave some casualties in its wake, highlighted below:

use std::{ffi::{OsString, OsStr}, path::{PathBuf, Path}, sync::Arc, rc::Rc};
fn impls<T: Into<PathBuf>>() {} // our basic tester

// these still work :D
impls::<String>();
impls::<&str>();

// and these :D
impls::<PathBuf>();
impls::<&Path>();
impls::<Box<Path>>();

// and these :D
impls::<OsString>();
impls::<&OsStr>();

// stuff starts breaking here :(

// These need to allocate anyways, so we can deref them to a borrowed repr and hope it goes away
impls::<Arc<str>>();
impls::<Rc<str>>();
impls::<Rc<Path>>();

// This is where it gets messy, PathBuf does not implement From for most
//   Box types (only Box<Path> is an exception really).
// Taking the deref here causes an unneeded allocation because we decay to a borrow.
// An experienced rustacean would know to use `String From<Box<str>>` and 
//   then rely on `PathBuf From<String>`, and similar for OsStr, but not
//   everyone will know to do this.
impls::<Box<str>>();
impls::<Box<OsStr>>();

Pros:

Cons:

Alternate solution

Adding another method to construct a WalkDir that specifically takes an Into<PathBuf> eases one of the cons, and removes the breakingness of this change, but I don't know what a good name would be. Open to discussion.

BurntSushi commented 1 year ago

I can at least say that there is no way I'm making this change in a semver compatible release, and I'm not keen on putting out a walkdir 3.0 at the moment, even if I thought this was a good change.

so any perf gains here are unlikely to be noticeable to the average user.

From what I can tell, this is the linchpin on which your request lies, yet it is not obviously true to me. As you point out, every DirEntry that is created performs an allocation because it owns its own PathBuf. I would be more interested in removing (or amortizing) all allocations rather than just a single one at the start of iteration, but doing so requires a rewrite of this crate (at minimum, I believe).

I honestly cannot conceive of any non-pathological (i.e., real workload with a perf difference that actually matters to an end user) scenario in which removing this allocation helps or improves anything. Do you have one that I can reproduce?

ultrabear commented 1 year ago

I honestly cannot conceive of any non-pathological (i.e., real workload with a perf difference that actually matters to an end user) scenario in which removing this allocation helps or improves anything. Do you have one that I can reproduce?

Unfortunately not, most areas I can think of would have to be unlikely edge cases or just abuse of the library.

I would be more interested in removing (or amortizing) all allocations rather than just a single one at the start of iteration, but doing so requires a rewrite of this crate (at minimum, I believe).

I think ill focus my efforts here then :), closing for now.

BurntSushi commented 1 year ago

I think ill focus my efforts here then :), closing for now.

Just a heads up here that this should be talked about in detail first if you're going to try to get it submitted here. My best guess though is that I not only won't have enough bandwidth to mentor it, but also won't be able to review it either. That might mean that your best path forward here is to fork or create a new crate.

I do have a branch, ag/sys, where I started down this path a few years ago. But I burned out. It is a surprising amount of work.