Enhancement: Improve performance with Hashcat slow-candidates (-S) option

RealEnder / dwpa

Distributed WPA PSK auditor

https://wpa-sec.stanev.org

GNU General Public License v3.0

310 stars 71 forks source link

Enhancement: Improve performance with Hashcat slow-candidates (-S) option #81

Closed EnderEyeGames closed 2 months ago

EnderEyeGames commented 2 years ago

In order to access the full speed of a dedicated GPU when cracking WPA2 hashes using Hashcat, it is necessary to activate the "slow candidates" option with -S. "Slow candidates" generates password candidates on the host CPU instead of the GPU, which massively improves performance in scenarios such as:

Slow hash algorithms (bcrypt, WPA2)
Small dictionaries with large rulesets (e.g. applying "dive" to an ESSID)

On my main machine (Arch Linux, hashcat 6.2.5, Ryzen 5 5600X, NVIDIA GTX 970), using the slow-candidates option boosts performance from 10-20 kH/s to its benchmark score of 160 kH/s on a simple dictionary attack. This option should be enabled by default in help_crack.py, but it is currently not.

Workaround: Use the -co option in help_crack.py to activate the -S option in Hashcat.

RealEnder commented 2 years ago

It's strange you're seeing difference for straight dictionary attack with -S. wpa-sec returns always same ESSID "hashes", so we can do PBKDF2-SHA1 just once. Can you share hashcat status output with and without -S for sessions running at least 5 min? It's interesting to see Acceleration/Loops/Threads values from autotune.

EnderEyeGames commented 2 years ago

It appears that the performance problems with GPU candidate generation mostly manifest in the -w 1 and -w 2 (default) modes, with autotune being given wider latitude at -w 3 to (mostly) solve the issue. Performance with -S is still slightly better than without it when using -w 3. This small benchmark was done using a straight dictionary of 9,541,899 candidates. -w 2, no -S: 35792 H/s -w 2, w/ -S: 133.2 kH/s -w 3, no -S: 133.3 kH/s -w 3, w/ -S: 140.4 kH/s Inspection of the full output seems to show that using -S allows the autotune mechanism to choose a larger number of "Loops" without causing bottlenecks. The tuning parameters chosen by -w 3 with and without -S vastly differ, even though the performance is similar. hashcat-output-benchmarking.txt

ZerBea commented 2 years ago

I don't trust in hashcat's own SPEED measurement.

Environment: Live measurement using time on a hash file that allow hashcat reuse of PBKDF2 on a common used salt (ESSID): Recovered.Total..: 1/32799 (0.00%) Digests

w3 only:

real    1m1,107s
user    0m11,609s
sys 0m8,354s

w3 + S:

real    1m0,275s
user    0m11,433s
sys 0m8,699s

RealEnder commented 2 years ago

So it looks like adding -S in our case makes things a bit slower for our workload. Will leave this open and will get back to it after core rewrite and rebase to m22000.

EnderEyeGames commented 2 years ago

So it looks like adding -S in our case makes things a bit slower for our workload. Will leave this open and will get back to it after core rewrite and rebase to m22000.

I'm curious as to how you arrived at the conclusion that it makes it slower when it took less "real time" to execute (01:00.275 vs 01:01.107) as well as less "user time" (although it's very odd that so much "user" and "kernel" time was taken if it's running on a GPU).

ZerBea commented 2 years ago

I'm sure it was a read error. At first glance ,275s looks more than ,107.

BTW: I use time, because, if running hashcat within help_crack.py, we have to take care about the entire process (loading hashcat, init hashcat, loading hash list and word list, terminating hashcat). Regarding hashcat's internal measurement only is a little bit out of scope.

EnderEyeGames commented 2 years ago

Indeed, I have noticed that when using help_crack.py, the small provided wordlists cause the initialization and autotune to take up a significant amount of time, which is why I usually use -w 2 -S rather than -w 3 for DWPA, since -w 3 takes much longer to auto-tune per dictionary. Importantly, -w 3 is not enabled by default in help_crack.py, so it would be using -w 2 without -S unless the user overrides it, which clearly results in huge bottlenecks for dGPUs. Using -S doesn't help nearly as much for CPUs or iGPUs according to small tests I've done on my laptop.

ZerBea commented 2 years ago

Good investigation. I did the tests on GPU and hash mode 22000 only.

IntentionalQuintessence commented 1 year ago

Has this been resolved?

RealEnder commented 2 months ago

Current help_crack.py version adds dynamic rules and sets -S option to hashcat. Thanks for the idea!