ColinIanKing / stress-ng

This is the stress-ng upstream project git repository. stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
https://github.com/ColinIanKing/stress-ng
GNU General Public License v2.0
1.77k stars 283 forks source link

af-alg stress test is using an algorithm that is not listed in /proc/crypto (hmac(sha3-384): bind failed) #418

Closed talhaHavadar closed 1 month ago

talhaHavadar commented 2 months ago

Hello Colin,

As summarized in the title it looks like it is trying to bind a socket with an algorithm that is not listed in /proc/crypto, I am not sure if this is expected or maybe I am missing a configuration in my system, please check the logs below for details, I would be grateful if you can help me on this matter.

stress-ng: debug: [2530] invoked with 'stress-ng --af-alg 0 --af-alg-dump --timeout 30 --skip-silent --verbose' by user 1000 'ubuntu' -stress-ng: debug: [2530] stress-ng 0.17.06 invoked g --timeout 30 --skip-silent --verbose
stress-ng: debug: [2530] system: Linux kria 6.8.0-1005-xilinx #6-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 20 04:37:47 UTC 2024 aarch64, gcc 13.2.0, glibc 2.39
stress-ng: debug: [2530] RAM total: 3.8G, RAM free: 2.5G, swap free: 0.0
stress-ng: debug: [2530] temporary file path: '/home/ubuntu', filesystem type: ext2 (1975996 blocks available)
stress-ng: debug: [2530] CPUs have 2 idle states: WFI, cpu-sleep-0
stress-ng: debug: [2530] 4 processors online, 4 processors configured
stress-ng: info: [2530] setting to a 30 secs run per stressor
stress-ng: debug: [2530] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [2530] cache allocate: shared cache buffer size: 2048K
stress-ng: info: [2530] dispatching hogs: 4 af-alg
stress-ng: debug: [2530] starting stressors
stress-ng: debug: [2531] af-alg: [2531] started (instance 0 on CPU 3)
stress-ng: debug: [2532] af-alg: [2532] started (instance 1 on CPU 2)
stress-ng: info: [2531] af-alg: dumping cryptographic algorithms found in /proc/crypto to stdout
{ .crypto_type = CRYPTO_AEAD, .type = "aead", .name = "gcm(aes)", .block_size = 1, .max_auth_size = 16, .iv_size = 12 },
{ .crypto_type = CRYPTO_AEAD, .type = "aead", .name = "rfc4106(gcm(aes))", .block_size = 1, .max_auth_size = 16, .iv_size = 8 },
{ .crypto_type = CRYPTO_AKCIPHER, .type = "akcipher", .name = "pkcs1pad(rsa,sha512)" },
{ .crypto_type = CRYPTO_AKCIPHER, .type = "akcipher", .name = "rsa" },
{ .crypto_type = CRYPTO_CIPHER, .type = "cipher", .name = "aes", .block_size = 16, .max_key_size = 32 },
{ .crypto_type = CRYPTO_CIPHER, .type = "cipher", .name = "cipher_null", .block_size = 1 },
{ .crypto_type = CRYPTO_RNG, .type = "rng", .name = "jitterentropy_rng" },
{ .crypto_type = CRYPTO_RNG, .type = "rng", .name = "stdrng" },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "blake2b-160", .block_size = 128, .digest_size = 20 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "blake2b-256", .block_size = 128, .digest_size = 32 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "blake2b-384", .block_size = 128, .digest_size = 48 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "blake2b-512", .block_size = 128, .digest_size = 64 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "cbcmac(aes)", .block_size = 1, .digest_size = 16 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "cmac(aes)", .block_size = 16, .digest_size = 16 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "crc32c", .block_size = 1, .digest_size = 4 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "crc64-rocksoft", .block_size = 1, .digest_size = 8 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "crct10dif", .block_size = 1, .digest_size = 2 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "digest_null", .block_size = 1 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "ghash", .block_size = 16, .digest_size = 16 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "hmac(sha1)", .block_size = 64, .digest_size = 20 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "hmac(sha256)", .block_size = 64, .digest_size = 32 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "md5", .block_size = 64, .digest_size = 16 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "polyval", .block_size = 16, .digest_size = 16 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha1", .block_size = 64, .digest_size = 20 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha224", .block_size = 64, .digest_size = 28 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha256", .block_size = 64, .digest_size = 32 },
stress-ng: debug: [2533] af-alg: [2533] started (instance 2 on CPU 0)
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha3-224", .block_size = 144, .digest_size = 28 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha3-256", .block_size = 136, .digest_size = 32 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha3-384", .block_size = 104, .digest_size = 48 },
stress-ng: debug: [2530] 4 stressors started
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha3-512", .block_size = 72, .digest_size = 64 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha384", .block_size = 128, .digest_size = 48 },
stress-ng: debug: [2534] af-alg: [2534] started (instance 3 on CPU 3)
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "sha512", .block_size = 128, .digest_size = 64 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "xcbc(aes)", .block_size = 16, .digest_size = 16 },
{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "xxhash64", .block_size = 32, .digest_size = 8 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "cbc(aes)", .block_size = 16, .max_key_size = 32, .iv_size = 16 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "ctr(aes)", .block_size = 1, .max_key_size = 32, .iv_size = 16 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "cts(cbc(aes))", .block_size = 16, .max_key_size = 32, .iv_size = 16 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "ecb(aes)", .block_size = 16, .max_key_size = 32 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "ecb(cipher_null)", .block_size = 1 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "essiv(cbc(aes),sha256)", .block_size = 16, .max_key_size = 32, .iv_size = 16 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "xctr(aes)", .block_size = 1, .max_key_size = 32, .iv_size = 16 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "xilinx-zynqmp-rsa", .block_size = 64, .max_key_size = 1024, .iv_size = 1 },
{ .crypto_type = CRYPTO_SKCIPHER, .type = "skcipher", .name = "xts(aes)", .block_size = 16, .max_key_size = 64, .iv_size = 16 },
stress-ng: info: [2531] af-alg: 43 cryptographic algorithms found in /proc/crypto
stress-ng: info: [2531] af-alg: 174 cryptographic algorithms in total (with defconfigs)
stress-ng: fail: [2531] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
stress-ng: fail: [2533] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
stress-ng: fail: [2532] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
stress-ng: fail: [2534] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
Assert occurred from file xsecure_sha.c at line 391

then the system hangs.

ColinIanKing commented 2 months ago

This is defined in the source file stress-af-alg-defconfigs.h in line 77:

{ .crypto_type = CRYPTO_SHASH, .type = "shash", .name = "hmac(sha3-384)", .block_size = 104, .digest_size = 48 },

..so the af-alg stressor will attempt to exercise this even if it's not built into your kernel as a module. The stressor is regularly updated with new af-alg crypto algorithms found in newer kernels. The older ones that are deprecated are not removed as stress-ng has to support a wide (age) range of kernels.

I do think that the errno=22 should be handled better by stress-ng, I suspect I should make it that it shows this as a info/warning message rather than a hard fail.

However, a system hang is not expected behaviour, so in one way stress-ng has probably found a kernel issue, which is part of it's testing remit.

Perhaps running strace -f on stress-ng may show the exact system call being used that triggers the hang.

talhaHavadar commented 2 months ago

Colin thank you for your input. Strangely it doesnt hang when I use it with strace -f please see the logs for run 3 and 4 for details. But when I run it with strace -o outputfile -f then it hangs again and it prints out softlockup messages from kernel for details please see logs for run 5, I tried to recover strace logs for the run 5 but coudnt do it only got journal which provides little information at the end. (and there is nothing inside /var/crash)

I will also discuss this with kernel team seems like we have a sneaky bug.

strace_run_5_journal.log strace_run_5_console.log strace_run_4_console.log strace_run_3_console.log

ColinIanKing commented 2 months ago

running strace will serialize the system calls so it may explain why it's harder to trigger when being strace'd. This implies that the issue is a race condition.

talhaHavadar commented 1 month ago

Hello Colin,

we did some updates in kernel and it seems like it is somehow improved but it still hangs without strace.

Before it was hanging with strace -f -o strace.log but now it finishes the job with failed test results, see below:

stress-ng: fail: [2219] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
info: 5 failures reached, aborting stress process
stress-ng: debug: [2219] af-alg: [2219] exited (instance 0 on CPU 0)
stress-ng: error: [2218] af-alg: [2219] terminated with an error, exit status=2 (stressor failed)
stress-ng: debug: [2218] af-alg: [2219] terminated (stressor failed)
stress-ng: error: [2218] af-alg: [2220] terminated with an error, exit status=2 (stressor failed)
stress-ng: debug: [2218] af-alg: [2220] terminated (stressor failed)
stress-ng: error: [2218] af-alg: [2221] terminated with an error, exit status=2 (stressor failed)
stress-ng: debug: [2218] af-alg: [2221] terminated (stressor failed)
stress-ng: error: [2218] af-alg: [2222] terminated with an error, exit status=2 (stressor failed)
stress-ng: debug: [2218] af-alg: [2222] terminated (stressor failed)
stress-ng: debug: [2218] metrics-check: all stressor metrics validated and sane
stress-ng: info: [2218] skipped: 0
stress-ng: info: [2218] passed: 0
stress-ng: info: [2218] failed: 4: af-alg (4)
stress-ng: info: [2218] metrics untrustworthy: 0
stress-ng: info: [2218] unsuccessful run completed in 16.05 secs

But without strace it still hangs and this time we are not seeing an error message from kernel:

ubuntu@kria:~$ stress-ng --af-alg 0 --timeout 30 --skip-silent --verbose
stress-ng: debug: [1521] invoked with 'stress-ng --af-alg 0 --timeout 30 --skip-silent --verbose' by user 1000 'ubuntu'
stress-ng: debug: [1521] stress-ng 0.17.06
stress-ng: debug: [1521] system: Linux kria 6.8.0-1007-xilinx #8-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep  9 08:33:13 UTC 2024 aarch64, gcc 13.2.0, glibc 2.39
stress-ng: debug: [1521] RAM total: 3.8G, RAM free: 3.2G, swap free: 0.0
stress-ng: debug: [1521] temporary file path: '/home/ubuntu', filesystem type: ext2 (2907056 blocks available)
stress-ng: debug: [1521] CPUs have 2 idle states: WFI, cpu-sleep-0
stress-ng: debug: [1521] 4 processors online, 4 processors configured
stress-ng: info:  [1521] setting to a 30 secs run per stressor
stress-ng: debug: [1521] cache allocate: using defaults, cannot determine cache level details
stress-ng: debug: [1521] cache allocate: shared cache buffer size: 2048K
stress-ng: info:  [1521] dispatching hogs: 4 af-alg
stress-ng: debug: [1521] starting stressors
stress-ng: debug: [1527] af-alg: [1527] started (instance 0 on CPU 1)
stress-ng: debug: [1528] af-alg: [1528] started (instance 1 on CPU 3)
stress-ng: debug: [1529] af-alg: [1529] started (instance 2 on CPU 0)
stress-ng: info:  [1527] af-alg: 47 cryptographic algorithms found in /proc/crypto
stress-ng: info:  [1527] af-alg: 177 cryptographic algorithms in total (with defconfigs)
stress-ng: debug: [1521] 4 stressors started
stress-ng: debug: [1531] af-alg: [1531] started (instance 3 on CPU 0)
stress-ng: fail:  [1528] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
stress-ng: fail:  [1527] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
stress-ng: fail:  [1531] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
stress-ng: fail:  [1529] af-alg: hmac(sha3-384): bind failed, errno=22 (Invalid argument)
Assert occurred from file xsecure_sha.c at line 391

what do you think about this Assert occurred from file xsecure_sha.c at line 391 message? Do you know if there is a way to get more information about the hang?

ColinIanKing commented 1 month ago

xsecure_sha.c is not part of stress-ng or in the generic linux kernel, so I'm not sure where that assert "Assert occurred from file xsecure_sha.c at line 391" message is coming from. Perhaps that's an extra crypto driver in your specific kernel.

One can dump all the names of all the known crypto engines used by the af-alg stressor by using:

stress-ng --af-alg 1 -t 1 --af-alg-dump

..maybe this will show some information pertinent to the xsecure_sha driver.

ColinIanKing commented 1 month ago

I suspect this is not a stress-ng issue per-se, so I'm closing this for now. If you believe it is a stress-ng issue, please feel free to reopen it.