Byron / dua-cli

View disk space usage and delete unwanted data, fast.
https://lib.rs/crates/dua-cli
MIT License
4.19k stars 113 forks source link

dua takes too much time and it constantly hangs #116

Closed c02y closed 2 years ago

c02y commented 2 years ago

Is there some kind of log that I can check to let me know why dua takes a very long time to complete (most of the time, I just kill it since I don't know when it will finish)

but gdu always only takes a few seconds.

Honestly, most of the time dua will just hang, like 8 out of 10 times(across several months, using different dua versions), I can not move my mouse in it, can not use Ctrl-c to quit it. I've tried using -x or not using it.

Peek 2021-12-24 14-59

FYI: ArchLinux dua 2.14.7 install from Arch repo gdu 5.12.1 install using go install

Byron commented 2 years ago

Thanks for posting and for the reproduction video. It's very strange to see dua hang like that. Even though I would have thought it's due to IO hangs, gdu doesn't seem to have such problem.

Maybe it's something else and dua hangs due to some other issue. Since dua is single-threaded, there is no synchronization going on at all. Instead jwalk does all the heavy lifting. Fortunately we can easily find out if it's coming from dua or from jwalk by running the jwalk/example/du program like this:

git clone https://github.com/Byron/jwalk
cd jwalk
cargo run --release --example du -- /

Note that my fork of jwalk changes du to not exclude hidden files, which is what dua does.

If it works, it's probably dua causing the hangs or something about the way it configures jwalk. Otherwise, it's most definitely something related to jwalk and we can try to solve the issue there.

Please let me know what you find.

c02y commented 2 years ago

Peek 2021-12-26 12-37

BTW:

  1. I don't have any external drive mounted
  2. I don't have any other IO task running.
Byron commented 2 years ago

Thanks for trying the experiment. This shows that despite being slow, it does complete. It's hard to imagine why dua wouldn't complete or take so long. I believe dua also does cycle checks which doesn't even happen in the du example, so that shouldn't be the source of issues either.

These spurious errors about the OS being busy seem interesting, as I think they might be worth a retry something dua doesn't currently do.

Could you also run dua (without the TUI) to see if this improves reliability? It has its own loop to consume the walkdir results and maybe that changes things.

Let me CC @jessegrosjean to add more experience to this thread.

c02y commented 2 years ago

Peek 2021-12-26 19-46

BTW: it seems dua doesn't handle C-c/C-d/C-z correctly as you can see in the gif, it sometimes freezes my whole tmux panel(dua i mode), and I cannot even kill it using kill command.

Byron commented 2 years ago

This is really interesting, as dua without TUI doesn't meddle with signals at all. This means, Ctrl+C sends a signal and the process aborts no matter what. If that's not happening, the process must be very, very stuck, probably on IO. Or in other words, aborting on Signal is automatic, and dua doesn't anything to handle this because it doesn't have to.

Probably that's an important hint about what's going on here.

My hypothesis is that even if only using a single thread it will still get stuck, what happens if dua -t 1 / is invoked?

Lastly, if that indeed also gets stuck, maybe it's a problem with traversing special files in /dev that gdu might naturally avoid.

Thanks for your help

c02y commented 2 years ago

I just tried dua -t 1 /, it is exactly the same as dua a /,

  1. takes long time to finish
  2. IO Errors at the end
  3. C-c cannot kill it when it is running
Byron commented 2 years ago

Perfect, this truly means it's unrelated to threading (as jwalk falls back to a serial implementation then) and instead is related to trying to access special files which shouldn't be accessed or traversed.

gdu indeed handles sockets specifically which dua or does not, probably nor does jwalk. Maybe this is where blocking call happens.Interestingly directories will only be opened for entries if they appear to be one, so it's hard to imagine a socket poses as directory to cause that to happen. Otherwise only metadata calls are done, which leads to the next experiment.

dua -t 1 -A only checks the apparent size, and skips checking the files block size which might make a difference (but probably won't as the metadata was already retrieved, there is no way not to retrieve metadata.)

c02y commented 2 years ago

dua -t 1 -A / is the same with dua -t 1 /, got the exact 3 issues listed, and plus another one:

Byron commented 2 years ago

The difference in file size is due to the way it counts with -A, that's expected.

This outcome probably means that merely traversing the directory structure and querying metadata is causing the hangs.

Can you run gdu -i /foobar, assuming that this turns off the default ignore directories and replaces them with one that doesn't matter.

I'd expect the gdu invocation to block, which means dua should lean how to ignore a certain set of directories by default on linux at least.

Byron commented 2 years ago

A new release is also available which mirrors the same logic as gdu.

Does that work better?

c02y commented 2 years ago

Yeah, gdu -i /foobar hangs for a little while, and it ignores Ctrl-c as well.

AND I just tested the new version of dua, it works fine now, like gdu, thanks.

Peek 2021-12-27 12-04

Byron commented 2 years ago

Great to hear. Maybe one more thing: if dua turns out to be slower than gdu, it might be worth playing with the -t flag to see how many threads are actually beneficial. On my machine, for instance, the value is at its best with only 4 out of 10 possible threads.

c02y commented 2 years ago

I tried -t 0~10, the best one is 4, all the other got over 2.7s results

>> time /tmp/dua -t 0 /                                                                                                                                                                                      [82/531]
 670.61 GB /1464290 entries

________________________________________________________
Executed in    2.79 secs    fish           external
   usr time   22.94 secs  324.00 micros   22.94 secs
   sys time   12.23 secs   46.00 micros   12.23 secs

>> time /tmp/dua -t 4 /
 670.61 GB /1452660 entries

________________________________________________________
Executed in    2.28 secs    fish           external
   usr time    8.46 secs    0.00 micros    8.46 secs
   sys time    2.46 secs  292.00 micros    2.46 secs

BTW gdu runs faster, but it is OK, I don't use this function frequently.

>> time gdu -ns /
625.2 GiB /

________________________________________________________
Executed in  910.33 millis    fish           external
   usr time    5.65 secs      0.00 micros    5.65 secs
   sys time    5.14 secs    252.00 micros    5.14 secs

dua vs gdu:

>> hyperfine "/tmp/dua -t 4 /" "gdu -ns /"
Benchmark 1: /tmp/dua -t 4 /
  Time (mean ± σ):      2.289 s ±  0.029 s    [User: 8.593 s, System: 2.432 s]
  Range (min … max):    2.264 s …  2.332 s    10 runs

Benchmark 2: gdu -ns /
  Time (mean ± σ):     758.9 ms ±  14.3 ms    [User: 4711.5 ms, System: 5071.2 ms]
  Range (min … max):   740.1 ms … 779.7 ms    10 runs

Summary
  'gdu -ns /' ran
    3.02 ± 0.07 times faster than '/tmp/dua -t 4 /'