ColinIanKing / stress-ng

This is the stress-ng upstream project git repository. stress-ng will stress test a computer system in various selectable ways. It was designed to exercise various physical subsystems of a computer as well as the various operating system kernel interfaces.
https://github.com/ColinIanKing/stress-ng
GNU General Public License v2.0
1.78k stars 284 forks source link

rtc test failed on PowerPC with Ubuntu F~N (0.17.08) #383

Closed Cypresslin closed 6 months ago

Cypresslin commented 6 months ago

Hi Colin, with the stress-ng 0.17.08 gb7c7a5877501 we found this test is failing with both P8 VM and P9 bare-metal, here is the output from Focal:

 Running '/home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_single_smoke_test.sh rtc'
 Free memory: 2970 MB
 Memory used: 2458 MB
 Using cgroup version 1
 /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_single_smoke_test.sh: line 32: [: too many arguments

 Machine Configuration
 Physical Pages:  64761
 Pages available: 47467
 Page Size:       65536
 Zswap enabled:   Y

 Free memory:
               total        used        free      shared  buff/cache   available
 Mem:        4144704      526848     3041088       12480      576768     3047872
 Swap:       1048512        8000     1040512

 Number of CPUs: 2
 Number of CPUs Online: 2

 Maximum bogo ops: 3000

 rtc STARTING
 rtc RETURNED 2
 rtc FAILED
 stress-ng: debug: [131299] invoked with './stress-ng -v -t 5 --rtc 4 --rtc-ops 3000 --ignite-cpu --syslog --verbose --verify --oomable' by user 0 'root'
 stress-ng: debug: [131299] stress-ng 0.17.08 gb7c7a5877501
 stress-ng: debug: [131299] system: Linux kt-f-l-gen-5-4-c2r4d20-u-stress-smk-test-ppc64el 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:30:51 UTC 2024 ppc64le, gcc 9.4.0, glibc 2.31, little endian
 stress-ng: debug: [131299] RAM total: 4.0G, RAM free: 2.9G, swap free: 1016.1M
 stress-ng: debug: [131299] temporary file path: '/home/ubuntu/autotest/client/tmp/ubuntu_stress_smoke_test/src/stress-ng', filesystem type: ext2 (3835752 blocks available)
 stress-ng: debug: [131299] CPUs have 2 idle states: Shared Cede, snooze
 stress-ng: debug: [131299] 2 processors online, 2 processors configured
 stress-ng: info:  [131299] setting to a 5 secs run per stressor
 stress-ng: debug: [131299] cache allocate: using cache maximum level L1
 stress-ng: debug: [131299] CPU data cache: L1: 32K
 stress-ng: debug: [131299] cache allocate: shared cache buffer size: 32K
 stress-ng: info:  [131299] dispatching hogs: 4 rtc
 stress-ng: debug: [131299] starting stressors
 stress-ng: debug: [131299] 4 stressors started
 stress-ng: debug: [131303] rtc: [131303] started (instance 3 on CPU 0)
 stress-ng: debug: [131302] rtc: [131302] started (instance 2 on CPU 1)
 stress-ng: fail:  [131302] rtc: ioctl RTC_ALRM_READ failed, errno=22 (Invalid argument)
 stress-ng: debug: [131302] rtc: [131302] exited (instance 2 on CPU 1)
 stress-ng: debug: [131300] rtc: [131300] started (instance 0 on CPU 1)
 stress-ng: fail:  [131303] rtc: ioctl RTC_ALRM_READ failed, errno=22 (Invalid argument)
 stress-ng: debug: [131303] rtc: [131303] exited (instance 3 on CPU 0)
 stress-ng: debug: [131301] rtc: [131301] started (instance 1 on CPU 1)
 stress-ng: fail:  [131301] rtc: ioctl RTC_ALRM_READ failed, errno=22 (Invalid argument)
 stress-ng: debug: [131301] rtc: [131301] exited (instance 1 on CPU 1)
 stress-ng: debug: [131300] rtc: [131300] exited (instance 0 on CPU 0)
 stress-ng: debug: [131299] rtc: [131300] terminated (success)
 stress-ng: error: [131299] rtc: [131301] terminated with an error, exit status=2 (stressor failed)
 stress-ng: debug: [131299] rtc: [131301] terminated (stressor failed)
 stress-ng: error: [131299] rtc: [131302] terminated with an error, exit status=2 (stressor failed)
 stress-ng: debug: [131299] rtc: [131302] terminated (stressor failed)
 stress-ng: error: [131299] rtc: [131303] terminated with an error, exit status=2 (stressor failed)
 stress-ng: debug: [131299] rtc: [131303] terminated (stressor failed)
 stress-ng: debug: [131299] metrics-check: all stressor metrics validated and sane
 stress-ng: info:  [131299] skipped: 0
 stress-ng: info:  [131299] passed: 1: rtc (1)
 stress-ng: info:  [131299] failed: 3: rtc (3)
 stress-ng: info:  [131299] metrics untrustworthy: 0
 stress-ng: info:  [131299] unsuccessful run completed in 0.42 secs

 Summary:
   Stressors run: 1
   Skipped: 0, 
   Failed:  1,  rtc
   Oopsed:  0, 
   Oomed:   0, 
   Passed:  0, 
   Badret:  0, 

 Tests took 1 seconds to run

I don't have chance to do a bisect yet, but 0.17.06 is good. Let me know if you need any other information.

Cypresslin commented 6 months ago

Bisect shows:

21d5baad74c70e42bb5f9e28b853bf97e8c26ea3 is the first bad commit
commit 21d5baad74c70e42bb5f9e28b853bf97e8c26ea3
Author: Colin Ian King <colin.i.king@gmail.com>
Date:   Tue Apr 9 19:21:44 2024 +0100

    stress-rtc: ensure EXIT_FAILURE is returned on failures

    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>

 stress-rtc.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Looks like this is catching something we should catch?

ColinIanKing commented 6 months ago

I wonder if this is occurring because these drivers/kernels don't support the RTC_ALRM_READ ioctl?

ColinIanKing commented 6 months ago

Please can you test using the following commit:

https://github.com/ColinIanKing/stress-ng/commit/dd12610368424dec857762ea5598eb7378cfe40b

commit dd12610368424dec857762ea5598eb7378cfe40b (HEAD -> master) Author: Colin Ian King colin.i.king@gmail.com Date: Wed May 8 13:48:53 2024 +0100

stress-rtc: ignore RTC_ALRM_READ EINVAL errors if the ioctl is not implemented

Older kernels may not support the RTC_ALRM_READ ioctl, so silently ignore
EINVAL errors.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cypresslin commented 6 months ago

Hi Colin, with stress-ng head set to dd12610, I noticed that commit 87c7e61 will cause an undefined reference to `shim_lsm_set_self_attr' and commit d326138 will cause an undefined reference to `stress_asm_ppc64_yield' on an Ubuntu Noble 6.8 PowerPC VM.

So I just test dd12610 on top of V0.17.08 and this ioctl RTC_ALRM_READ failed issue has been muted. But another one pops up:

stress-ng: debug: [68774] stress-ng 0.17.08 gec64c89a5eb8
stress-ng: debug: [68774] system: Linux kernel-P10d-LPAR10 6.8.0-31-generic #31-Ubuntu SMP Sat Apr 20 00:05:55 UTC 2024 ppc64le, gcc 13.2.0, glibc 2.39, little endian
stress-ng: debug: [68774] RAM total: 15.4G, RAM free: 12.2G, swap free: 5.0G
stress-ng: debug: [68774] temporary file path: '/home/ubuntu/stress-ng', filesystem type: ext2 (10049225 blocks available)
stress-ng: debug: [68774] CPUs have 2 idle states: Shared Cede, snooze
stress-ng: debug: [68774] 16 processors online, 16 processors configured
stress-ng: info:  [68774] setting to a 5 secs run per stressor
stress-ng: debug: [68774] cache allocate: using cache maximum level L1
stress-ng: debug: [68774] CPU data cache: L1: 32K
stress-ng: debug: [68774] cache allocate: shared cache buffer size: 64K (LLC size x 2 NUMA nodes)
stress-ng: info:  [68774] dispatching hogs: 4 rtc
stress-ng: debug: [68774] starting stressors
stress-ng: debug: [68775] rtc: [68775] started (instance 0 on CPU 14)
stress-ng: debug: [68776] rtc: [68776] started (instance 1 on CPU 0)
stress-ng: debug: [68774] 4 stressors started
stress-ng: debug: [68777] rtc: [68777] started (instance 2 on CPU 15)
stress-ng: debug: [68778] rtc: [68778] started (instance 3 on CPU 14)
stress-ng: fail:  [68775] rtc: ioctl RTC_WKALRM_RD failed, errno=22 (Invalid argument)
stress-ng: debug: [68775] rtc: [68775] exited (instance 0 on CPU 14)
stress-ng: fail:  [68777] rtc: ioctl RTC_WKALRM_RD failed, errno=22 (Invalid argument)
stress-ng: debug: [68777] rtc: [68777] exited (instance 2 on CPU 15)
stress-ng: error: [68774] rtc: [68775] terminated with an error, exit status=2 (stressor failed)
stress-ng: debug: [68774] rtc: [68775] terminated (stressor failed)
stress-ng: debug: [68776] rtc: [68776] exited (instance 1 on CPU 5)
stress-ng: debug: [68778] rtc: [68778] exited (instance 3 on CPU 0)
stress-ng: debug: [68774] rtc: [68776] terminated (success)
stress-ng: error: [68774] rtc: [68777] terminated with an error, exit status=2 (stressor failed)
stress-ng: debug: [68774] rtc: [68777] terminated (stressor failed)
stress-ng: debug: [68774] rtc: [68778] terminated (success)
stress-ng: debug: [68774] metrics-check: all stressor metrics validated and sane
stress-ng: info:  [68774] skipped: 0
stress-ng: info:  [68774] passed: 2: rtc (2)
stress-ng: info:  [68774] failed: 2: rtc (2)

This behaviour can be reproduced on mainline 6.9.0-060900rc5-generic with the same PowerPC VM as well.

ColinIanKing commented 6 months ago

Ah, ok, I've pushed the following RTC fix for the other ioctls:

commit 4e444914f35700b42e46f1305d9efd8f2705c537 (HEAD -> master) Author: Colin Ian King colin.i.king@gmail.com Date: Wed May 8 18:28:13 2024 +0100

stress-rtc: ignore ioctl EINVAL errors if the ioctl is not implemented

Older kernels may not support the RTC_* ioctls, so silently ignore
ColinIanKing commented 6 months ago

undefined stress_asm_ppc64_yield() fixed with commit:

commit f4ea9eb5edc894efbfc025b3eb40bd6fc2672e00 (HEAD -> master) Author: Colin Ian King colin.i.king@gmail.com Date: Wed May 8 18:51:10 2024 +0100

core-lock: include asm headers for arch specific pause/yield ops

Fixes: d32613839d3b ("core-lock: add pause/yield in spinlock for architectures other than x86")
ColinIanKing commented 6 months ago

undefined lsm system call fixed with commit:

commit 9dfdfd1658907448120e39729ccd0dfb0afc32dc Author: Colin Ian King colin.i.king@gmail.com Date: Wed May 8 19:04:40 2024 +0100

stress-lsm: fix missing shim_lsm_set_self_attr helper

Fixes commit 87c7e613dcf1 ("stress-lsm: exercise lsm set with invalid ctx_len")
Cypresslin commented 6 months ago

Verified on F-5.4 and N-6.8 PowerPC VM. With these commits test can be built without any issue now, and the test can passed without any issue, thanks!