P-H-C / phc-winner-argon2

The password hash Argon2, winner of PHC
Other
4.82k stars 411 forks source link

Benchmarks/timings are a little misleading #204

Open samscott89 opened 7 years ago

samscott89 commented 7 years ago

Hi,

Just a note that currently the benchmark timings (and similarly the output of the CLI tool) are perhaps a bit confusing, by reporting the CPU time instead of wall.

This is probably intentional, but there could perhaps be a note somewhere, or an option to display both.

For example:

$ ./bench
# snipped...
Argon2i 3 iterations  2048 MiB 1 threads:  6.92 cpb 14183.24 Mcycles 
5.8276 seconds

Argon2i 3 iterations  2048 MiB 2 threads:  4.40 cpb 9007.43 Mcycles 
6.6145 seconds

Argon2i 3 iterations  2048 MiB 4 threads:  2.70 cpb 5523.75 Mcycles 
6.9787 seconds

Argon2i 3 iterations  2048 MiB 8 threads:  1.92 cpb 3925.02 Mcycles 
10.2185 seconds

$  echo -n "password" | ./argon2 somesalt -t 3 -m 21 -p 1 -l 32
Type:       Argon2i
Iterations: 3 
Memory:     2097152 KiB
Parallelism:    1 
Hash:       d8575d9abce23ce8b4f71c281b922e3cb030e615e84e0645eb287f600f2ee6ba
Encoded:    $argon2i$v=19$m=2097152,t=3,p=1$c29tZXNhbHQ$2FddmrziPOi09xwoG5IuPLAw5hXoTgZF6yh/YA8u5ro
5.965 seconds
Verification ok

$ echo -n "password" | ./argon2 somesalt -t 3 -m 21 -p 8 -l 32
Type:       Argon2i
Iterations: 3 
Memory:     2097152 KiB
Parallelism:    8 
Hash:       abef5665eb19c0116e738a6b679cee8b522548c0ef96323cfe6fdb13876475db
Encoded:    $argon2i$v=19$m=2097152,t=3,p=8$c29tZXNhbHQ$q+9WZesZwBFuc4prZ5zui1IlSMDvljI8/m/bE4dkdds
10.087 seconds
Verification ok

From which it appears as though it takes longer with more threads. When the actual wall time is significantly shorter:

$ echo -n "password" | /usr/bin/time -p ./argon2 somesalt -t 3 -m 21 -p 1 -l 32 -r
d8575d9abce23ce8b4f71c281b922e3cb030e615e84e0645eb287f600f2ee6ba
real 5.39
user 5.88
sys 0.21

$ echo -n "password" | /usr/bin/time -p ./argon2 somesalt -t 3 -m 21 -p 8 -l 32 -r
abef5665eb19c0116e738a6b679cee8b522548c0ef96323cfe6fdb13876475db
real 1.47
user 9.87
sys 0.36
solardiz commented 7 years ago

This undocumented reporting of CPU instead of real time also makes Argon2 appear to be faster than it actually is under the current system load - which is especially relevant when benchmarking multiple concurrent instances using a script around the supplied argon2 program. With this reporting, it'd appear that running more concurrent instances than there are logical CPUs in the system produces higher cumulative speed, but this is of course not true.

The program should say "seconds of CPU time" or the like (not just "seconds"), or alternatively the uses of clock() would need to be replaced with times() (its return value is real time, even though it reports CPU times in the supplied structure, which we don't have to use) and CLOCKS_PER_SEC with sysconf(_SC_CLK_TCK) (not CLK_TCK, which on modern systems might be undefined or wrong). Better yet, both real and CPU times could be reported (with the latter obtained by summing the fields returned by times() then since we'd be calling it anyway, and we'd cache sysconf(_SC_CLK_TCK) in a variable since it'd be used multiple times), with clear wording.

vault-thirteen commented 12 months ago

This is because benchmark is using the RDTSC CPU instruction. Time calculation should be performed by the environmental code (by an O.S. calling this library), not by the library's low-level C code itself.

Also see this: https://github.com/P-H-C/phc-winner-argon2/issues/368

solardiz commented 12 months ago

@vault-thirteen No, that's a separate issue. This one is about the CPU time reporting vs. users' expectations of real time being reported. The readings for this are obtained from the OS, via clock(). The code also uses RDTSC for other reporting, but this issue isn't about that.