Closed glowinthedark closed 5 months ago
Thanks for reporting!
Can you try to use hyperfine
and see the impact of the thread-count on performance? Note that I threw in pdu
as well as it usually is the fastest way to iterate.
root=<path-to-measure>
hyperfine -N -w1 -M2 "gdu $root" "dua -t1 $root" "dua -t2 $root" "dua -t4 $root" "dua -t8 $root" "pdu $root"
The theory is that dua
uses too many threads which can actually hurt performance on MacOS, and I noticed that 3 to 4 threads is usually giving the best performance. Maybe there is a number that is bringing it closer to gdu
. Lastly, pdu
is typically faster than dua
and I'd expect it to be as fast as gdu
or faster. Please note that it has flags for thread-counts as well, in case you want to dive deeper if the results are interesting.
Also note that this uses the non-interactive version of dua
which uses the same traversal engine under the hood.
@Byron
hyperfine
resultsSummary
'gdu /media/t12/Music' ran
1.07 ± 0.01 times faster than 'dua -t2 /media/t12/Music'
1.13 ± 0.00 times faster than 'dua -t4 /media/t12/Music'
1.31 ± 0.02 times faster than 'dua -t8 /media/t12/Music'
1.49 ± 0.01 times faster than 'dua -t1 /media/t12/Music'
Summary
dua -t8 ~/projects ran
1.08 ± 0.00 times faster than pdu ~/projects
1.30 ± 0.00 times faster than dua -t4 ~/projects
1.50 ± 0.01 times faster than gdu ~/projects
2.16 ± 0.00 times faster than dua -t2 ~/projects
3.94 ± 0.02 times faster than dua -t1 ~/projects
The non-interactive dua
mode is performing great, i.e. dua -t8 ~/projects
is very fast on APFS.
The slowness is observed with interactive mode with e.g. dua -t8 i ~/projects
which takes almost forever. Not sure what would be the hyperfine
command for testing interactive mode as I suppose it probably cannot handle tty mode (?)
Thanks for the measurements, very interesting results!
It's very interesting that gdu
manages to be this much faster on Linux, and thread-scaling doesn't seem to do dua
much good with -t2
being the best value on a 4-core machine.
On MacOS it scales much better, but the question remains why it's slow in interactive mode.
I have a hunch and implemented a fix in #225, which you are invited to try out. If you'd say that the ~/projects
folder as a lot of top-level entries, then my hunch might be true.
Something you could also check is how many threads gdu
uses by default - it's entirely unclear to me why it's so much faster on Linux except that maybe it's related to internal inefficiencies during traversal which weigh dua
down (see #224). Edit: Maybe it's also related to the HDD being less receptive to the order of traversal or something related to it due to generally higher latencies. Whatever it is that makes it faster on SSD might be what makes it slower on HDD.
PS: I have made a new release with the fix, and would hope it will improve the situation as this is the only guess I had: https://github.com/Byron/dua-cli/releases/tag/v2.27.2 . Should it still not release the handbreaks you'd probably need to instrument a run, but we get there when we get there.
compiling for apple silicon on macos m2 throws an error while running cargo install dua-cli
error[E0446]: crate-private type `FilesystemScan` in public interface
--> ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/dua-cli-2.27.2/src/interactive/app/state.rs:42:5
|
27 | pub(crate) struct FilesystemScan {
| -------------------------------- `FilesystemScan` declared as crate-private
...
42 | pub scan: Option<FilesystemScan>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't leak crate-private type
For more information about this error, try `rustc --explain E0446`.
error: could not compile `dua-cli` (bin "dua") due to previous error
error: failed to compile `dua-cli v2.27.2`, intermediate artifacts can be found at `/var/folders/py/73sb2fsj37xbmtkgw111l07w0000gp/T/cargo-installoMoXeN`.
To reuse those artifacts with a future compilation, set the environment variable `CARGO_TARGET_DIR` to that path.
same error when explicitly checking out the tag (both on macos m2 and linux arm64):
git clone https://github.com/Byron/dua-cli.git && cd dua-cli
git checkout tags/v2.27.2
cargo build --release
Tried the Intel X86 binary from the releases — completes Ok:
/tmp/dua-v2.27.2-x86_64-apple-darwin/dua i ~/projects
Sort mode: size descending Total disk usage: 149.07 GB
Processed 1743246 entries in 9.81s
the original m2-binary (v2.20.1 arm64) still shows scanning
apparently even after scanning finished (although the number of entries is not identical) 🤔
Entries: 1 in 0s (472/s) -> scanning <- 149.07 GB
Entries: 1743248 in 8.99s
compiling for apple silicon on macos m2 throws an error while running
cargo install dua-cli
This is fixed now in main
, see #226 .
the original m2-binary (v2.20.1 arm64) still shows
scanning
apparently even after scanning finished (although the number of entries is not identical) 🤔
This typically means that it is indeed still scanning, but all threads are stalled, presumably. I recommend to try again building the latest version. Let's see.
pulling, building and running latest main
now makes dua -t8 i ..
finish scanning in about the same time as gdu
with just ~2..3 seconds difference on macos m2 (1744024 entries in 22.25s
), on linux rpi 5 arm64 8GB RAM scanning a 765GB file system tree on a NVME m2 drive takes roughly equal time as gdu
(723.05 GiB Processed 640603 entries in 5.25s
), hard to tell the difference
thank you so much for taking the time to look into this — much appreciated! 🙏
Thanks so much for letting me know, it's much appreciated, too :).
It's great to hear that the fix did indeed work, and that gdu
isn't unconditionally faster anymore :).
Closing, as it sounds like this issue is no more.
Directory scanning with
dua i /some/folder
takes orders of magnitude longer compared to gdu even when setting-t <some-number-bigger-than-number-of-cpu-cores>
.Didn't do any proper benchmarks, but just an example, while
dua
shows progress info with number of scanned files around 64kgdu
in the same time on the same folder reaches around 300k+ files.dua-cli
takes minutes longer to complete a full scan.The huge speed difference has been observed with APFS (macos), HFS+ (macos), exFat (macos, linux), EXT4 (Linux with both armv7/arm64 and intel cpu's).
Note:
dua-cli
is still as fast or faster thanncdu
, so apparently it'sgdu
that does some serious optimizations to speed up the scan. On macos APFSgdu
full scan takes less time than calling ootb Apple's Finder Get Info on the same folder.