birkenfeld / fddf

Fast data dupe finder
Apache License 2.0
109 stars 9 forks source link

Why is io::stdout locked for the whole duration of the consumer thread ? #7

Closed StyMaar closed 6 years ago

StyMaar commented 6 years ago

I'm fiddling a bit with the code to improve performances and at some point I got bitten by a random deadlock. After investigation, it's because stdout is locked for the whole duration of the consumer thread here, then if you try to println! anything from another thread when the consumer thread is up, you have a deadlock.

It's not a problem in production or anything, but I just found this behavior a bit counter-intuitive and I'm afraid other potential contributors could face the same problem. Is this long-living lock on stdout intentional ? Or can we remove it ?

birkenfeld commented 6 years ago

Thanks for looking at fddf! I haven't ignored your other issue, but it may be a bit before I can look at it. The lock intentional, and it's for performance - it avoids locking and unlocking the global stdout for every single print. I'd suggest using eprintln for debugging purposes. And of course a comment to that effect wouldn't hurt for others reading the code :)

StyMaar commented 6 years ago

Is the performance gain really relevant though ?

I ran a benchmark of printing to stdout on my machine, and the difference between locking out of the loop or locking inside the loop is around 100ns per printed line (less than 6% of the total amount of time spent printing to stdout).

When I ran fddf on my 3TB disk, the output was around 1 million lines : saving 100ns per lines would have saved me 0.1s, out of several hours of run ;)

birkenfeld commented 6 years ago

Sure, when you run the tool on a cold cache (or 3TB of data, which will make any cache cold :smile:).

But it does make a difference when a lot of what it does is printing - small files, lots of dupes, and all data in cache. And this use case is important: a lot of times you run the tool again and again during a cleanup.

Also, keep in mind that the performance difference is platform dependent. I definitely have seen Rust applications bottlenecked by stdout print speed, and not releasing the lock is one of the strategies that help.