Closed EnderEyeGames closed 2 months ago
It's strange you're seeing difference for straight dictionary attack with -S. wpa-sec returns always same ESSID "hashes", so we can do PBKDF2-SHA1 just once. Can you share hashcat status output with and without -S for sessions running at least 5 min? It's interesting to see Acceleration/Loops/Threads values from autotune.
It appears that the performance problems with GPU candidate generation mostly manifest in the -w 1 and -w 2 (default) modes, with autotune being given wider latitude at -w 3 to (mostly) solve the issue. Performance with -S is still slightly better than without it when using -w 3. This small benchmark was done using a straight dictionary of 9,541,899 candidates. -w 2, no -S: 35792 H/s -w 2, w/ -S: 133.2 kH/s -w 3, no -S: 133.3 kH/s -w 3, w/ -S: 140.4 kH/s Inspection of the full output seems to show that using -S allows the autotune mechanism to choose a larger number of "Loops" without causing bottlenecks. The tuning parameters chosen by -w 3 with and without -S vastly differ, even though the performance is similar. hashcat-output-benchmarking.txt
I don't trust in hashcat's own SPEED measurement.
Environment: Live measurement using time on a hash file that allow hashcat reuse of PBKDF2 on a common used salt (ESSID): Recovered.Total..: 1/32799 (0.00%) Digests
w3 only:
real 1m1,107s
user 0m11,609s
sys 0m8,354s
w3 + S:
real 1m0,275s
user 0m11,433s
sys 0m8,699s
So it looks like adding -S in our case makes things a bit slower for our workload. Will leave this open and will get back to it after core rewrite and rebase to m22000.
So it looks like adding -S in our case makes things a bit slower for our workload. Will leave this open and will get back to it after core rewrite and rebase to m22000.
I'm curious as to how you arrived at the conclusion that it makes it slower when it took less "real time" to execute (01:00.275 vs 01:01.107) as well as less "user time" (although it's very odd that so much "user" and "kernel" time was taken if it's running on a GPU).
I'm sure it was a read error. At first glance ,275s looks more than ,107.
BTW: I use time, because, if running hashcat within help_crack.py, we have to take care about the entire process (loading hashcat, init hashcat, loading hash list and word list, terminating hashcat). Regarding hashcat's internal measurement only is a little bit out of scope.
Indeed, I have noticed that when using help_crack.py, the small provided wordlists cause the initialization and autotune to take up a significant amount of time, which is why I usually use -w 2 -S rather than -w 3 for DWPA, since -w 3 takes much longer to auto-tune per dictionary. Importantly, -w 3 is not enabled by default in help_crack.py, so it would be using -w 2 without -S unless the user overrides it, which clearly results in huge bottlenecks for dGPUs. Using -S doesn't help nearly as much for CPUs or iGPUs according to small tests I've done on my laptop.
Good investigation. I did the tests on GPU and hash mode 22000 only.
Has this been resolved?
Current help_crack.py version adds dynamic rules and sets -S option to hashcat. Thanks for the idea!
In order to access the full speed of a dedicated GPU when cracking WPA2 hashes using Hashcat, it is necessary to activate the "slow candidates" option with -S. "Slow candidates" generates password candidates on the host CPU instead of the GPU, which massively improves performance in scenarios such as:
On my main machine (Arch Linux, hashcat 6.2.5, Ryzen 5 5600X, NVIDIA GTX 970), using the slow-candidates option boosts performance from 10-20 kH/s to its benchmark score of 160 kH/s on a simple dictionary attack. This option should be enabled by default in help_crack.py, but it is currently not.
Workaround: Use the -co option in help_crack.py to activate the -S option in Hashcat.