This is wasteful; we allocate a pathbuf and copy the path into it, only to destroy the original immediately afterwards. With the proposed into_path, this is possible without the extra copy.
Performance
I have a small program that basically does this:
let dir = env::args().nth(1).unwrap();
let ext = OsStr::new("foo");
let wd = walkdir::WalkDir::new(&dir)
.follow_links(true)
.max_open(128);
let paths: Vec<PathBuf> = wd
.into_iter()
.map(|e| e.unwrap())
.filter(|e| e.file_type().is_file())
.map(|e| PathBuf::from(e.path()))
.filter(|p| p.extension() == Some(ext))
.collect();
std::process::abort();
My actual program is part of a larger program and also prints to stdout every 64 iterations. I ran this program on a directory where the iterator yields 12408 paths, with a warm page cache on Linux. Times were recorded by running this under perf stat. I repeated this 16 times for each configuration. The raw data is below, copy for PathBuf::from(e.path()) and noncopy for e.into_path().
Welch Two Sample t-test
data: copy and noncopy
t = 2.6055, df = 28.297, p-value = 0.01447
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.000242829 0.002024684
sample estimates:
mean of x mean of y
0.02260764 0.02147388
This pull requests adds
fn into_path(self) -> PathBuf
toDirEntry
.Motivation
The use case for this is that I have an API like this:
I would like to be able to pass in something like this:
Unfortunately that does not work, I have to make a copy:
This is wasteful; we allocate a pathbuf and copy the path into it, only to destroy the original immediately afterwards. With the proposed
into_path
, this is possible without the extra copy.Performance
I have a small program that basically does this:
My actual program is part of a larger program and also prints to stdout every 64 iterations. I ran this program on a directory where the iterator yields 12408 paths, with a warm page cache on Linux. Times were recorded by running this under
perf stat
. I repeated this 16 times for each configuration. The raw data is below,copy
forPathBuf::from(e.path())
andnoncopy
fore.into_path()
.That’s about a 5% speedup.