Open StyMaar opened 6 years ago
I'm running fdupes
right now to have an «apple to apple» comparison.
Oh, I forgot to tell which version I was running : master (b2da1856bb407339f2f8737f19bed42954d33286) built with rust 1.19 (cargo build --release
)
fdupes
is pretty irregular, but faster (1-10MB/s)
the raw figures aren't really interesting (it's a RAID array with encryption, which slows things down), but I think the difference with other tools is relevant.
Increasing the number of threads in the thread pool (I arbitrarily chose 20) helped me reach 10MB/s during the first part of the process (when walking the directories and hashing files), and during the second part (exact file comparison) I'm currently around 40MB/s. For the second part, I don't really know if increasing the number of threads changed anything.
According to this benchmark my HDD (Seagate HDD 1TB, ST1000LM014) performs best when the number of outstanding IO operations is 32. Does that mean I should use a threadpool of 32+1 threads?
I'm running fddf on Debian Jessy, and the I/O read (shown by iotop) never goes 3MB/s. The tasks isn't CPU bound either, ~25% on both two cores. By comparison,
ls -R
reads between 10 and 15 MB per seconds, so does rsync on the same workload.The directory I'm running fddf on contains a lot of small files (text files), a big amount of medium files (pictures or mp3) and a decent number of big files (movies or .iso images).
I have no idea how file I/Os work on Linux, then I don't know how to speed this up.