Closed Cypresslin closed 2 years ago
Thanks for the report, I will investigate this. Can you attach strace to the running stressors to see if there is a specific system call it gets locked up on? e.g. sudo strace -p stress-ng-process-id
All processes that are running are blocked on a rt_sigsuspend([], 8...)
Fix committed:
commit cacea4982733e02af4e29e6d2dbd3d687af2b89b (HEAD -> master) Author: Colin Ian King colin.i.king@gmail.com Date: Mon Nov 14 12:54:51 2022 +0000
stress-syscall: terminate sigsuspend syscall child proceses
Can you apply this (it may need wiggling a minor amount) to your repo. I was able to reproduce this on a 24 thread ARM dev box and with the fix it no longer occurs.
Hey Colin, that's super fast, I have this fix verified with one of our zVM and the hang issue does not exist anymore. Thank you! Sam
Hi Colin,
I found this issue while testing the V0.14.6 update.
This can be spotted on some of our testing nodes with Ubuntu Jammy 5.15.0-53.59, including:
Although the test suite has finished without any error, there will be some left-over processes preventing our autotest framework process to finish cleanly:
This will make the jenkins job hang, and being killed eventually with the timeout setting on Jenkins.
It looks like the cause is the syscall stressor, and my bisect result suggests the same:
Thanks