Closed KianTechHub closed 3 weeks ago
I've just pushed a few more commits to the repository that should be able to catch and debug where the SIGSEGV is occurring. Do you mind pulling these new changes and rebuilding and re-testing so we can figure out where the issue is occurring.
OK,i will do it, As soon as I find something, I will report back
by the way,i modify the code,
core-helper.c added #include
If you do not change, the compiler will report an error, I am using Android ndk static compilation
The following two lines are commented out in the compile-generated config.h, and undefined compilation errors may occur without comments
//#define HAVE_PTHREAD_PRIO_INHERIT //#define HAVE_PTHREAD_PRIO_NONE
this is new tested failed afer pull the latest code at :commit 4a11ac95d6549284df416326a80c4e9db0030740
console:/data/local/tmp # ./stress-ng --icache 2 -v stress-ng: debug: [10770] invoked with './stress-ng --icache 2 -v' by user 0 stress-ng: debug: [10770] stress-ng 0.18.05 g4a11ac95d654 stress-ng: debug: [10770] system: Linux localhost 5.4.254+ #1 SMP PREEMPT Wed May 8 09:34:33 CST 2024 aarch64, clang 17.0.2, unknown libc version, little endian stress-ng: debug: [10770] RAM total: 3.0G, RAM free: 1.9G, swap free: 2.3G stress-ng: debug: [10770] temporary file path: '/data/local/tmp', filesystem type: f2fs (2185794 blocks available) stress-ng: debug: [10770] CPUs have 3 idle states: BUSY, WFI, cpu-sleep-0 stress-ng: debug: [10770] 4 processors online, 4 processors configured stress-ng: info: [10770] defaulting to a 1 day run per stressor stress-ng: debug: [10770] cache allocate: using defaults, cannot determine cache level details stress-ng: debug: [10770] cache allocate: shared cache buffer size: 2048K stress-ng: info: [10770] dispatching hogs: 2 icache stress-ng: debug: [10770] starting stressors stress-ng: debug: [10770] 2 stressors started stress-ng: debug: [10771] icache: [10771] started (instance 0 on CPU 3) stress-ng: debug: [10772] icache: [10772] started (instance 1 on CPU 0) stress-ng: debug: [10771] caught SIGSEGV, address 0x0000000000000f04 (SEGV_MAPERR) stress-ng: debug: [10771] stress-ng: info: 0x0000000000000f00 not readable stress-ng: debug: [10771] stress-ng: info: 0x0000000000000f10 not readable stress-ng: debug: [10771] stress-ng: info: 0x0000000000000f20 not readable stress-ng: error: [10770] icache: [10771] terminated with an error, exit status=2 (stressor failed) stress-ng: debug: [10770] icache: [10771] terminated (stressor failed) stress-ng: debug: [10772] caught SIGSEGV, address 0x0000000000000f04 (SEGV_MAPERR) stress-ng: debug: [10772] stress-ng: info: 0x0000000000000f00 not readable stress-ng: debug: [10772] stress-ng: info: 0x0000000000000f10 not readable stress-ng: debug: [10772] stress-ng: info: 0x0000000000000f20 not readable stress-ng: error: [10770] icache: [10772] terminated with an error, exit status=2 (stressor failed) stress-ng: debug: [10770] icache: [10772] terminated (stressor failed) stress-ng: warn: [10770] metrics-check: all bogo-op counters are zero, data may be incorrect stress-ng: debug: [10770] metrics-check: all stressor metrics validated and sane stress-ng: info: [10770] skipped: 0 stress-ng: info: [10770] passed: 0 stress-ng: info: [10770] failed: 2: icache (2) stress-ng: info: [10770] metrics untrustworthy: 0 stress-ng: info: [10770] unsuccessful run completed in 0 secs
That's very unexpected. Can you comment out the following two lines in function stress_icache_func() in stress-icache.c, rebuild and re-test and see if the icache flushing is causing the issue:
//shim_flush_icache((char *)page, (char *)page + 64);
*vaddr = val;
//shim_flush_icache((char *)page, (char *)page + 64);
by the way,i modify the code, core-helper.c added #include
stress-workload.c added #include If you do not change, the compiler will report an error, I am using Android ndk static compilation
Thanks for letting me know, I've added these changes to stress-ng
The following two lines are commented out in the compile-generated config.h, and undefined compilation errors may occur without comments
//#define HAVE_PTHREAD_PRIO_INHERIT //#define HAVE_PTHREAD_PRIO_NONE
I've fixed this and pushed it to the repo.
The following two lines are commented out in the compile-generated config.h, and undefined compilation errors may occur without comments //#define HAVE_PTHREAD_PRIO_INHERIT //#define HAVE_PTHREAD_PRIO_NONE
I've fixed this and pushed it to the repo.
All compilation errors have been resolved, and it can be successfully compiled without any modifications.
by the way,i modify the code, core-helper.c added #include
stress-workload.c added #include If you do not change, the compiler will report an error, I am using Android ndk static compilation Thanks for letting me know, I've added these changes to stress-ng
All compilation errors have been resolved, and it can be successfully compiled without any modifications.
That's very unexpected. Can you comment out the following two lines in function stress_icache_func() in stress-icache.c, rebuild and re-test and see if the icache flushing is causing the issue:
//shim_flush_icache((char *)page, (char *)page + 64); *vaddr = val; //shim_flush_icache((char *)page, (char *)page + 64);
Test results: Some behavior changes occurred, and it seems to have improved compared to before the changes. Some tests were successful. However, during testing, there were instances where the program seemed to freeze, and I had to use Ctrl+C to exit the program.
stress-ng: debug: [2042] invoked with './stress-ng --icache 4 -v' by user 0 stress-ng: debug: [2042] stress-ng 0.18.05 g1cb7016a5151 stress-ng: debug: [2042] system: Linux localhost 5.4.254+ #1 SMP PREEMPT Wed May 8 09:34:33 CST 2024 aarch64, clang 17.0.2, unknown libc version, little endian stress-ng: debug: [2042] RAM total: 3.0G, RAM free: 1.5G, swap free: 2.3G stress-ng: debug: [2042] temporary file path: '/data/local/tmp', filesystem type: f2fs (2185584 blocks available) stress-ng: debug: [2042] CPUs have 3 idle states: BUSY, WFI, cpu-sleep-0 stress-ng: debug: [2042] 4 processors online, 4 processors configured stress-ng: info: [2042] defaulting to a 1 day run per stressor stress-ng: debug: [2042] cache allocate: using defaults, cannot determine cache level details stress-ng: debug: [2042] cache allocate: shared cache buffer size: 2048K stress-ng: info: [2042] dispatching hogs: 4 icache stress-ng: debug: [2042] starting stressors stress-ng: debug: [2043] icache: [2043] started (instance 0 on CPU 1) stress-ng: debug: [2044] icache: [2044] started (instance 1 on CPU 3) stress-ng: debug: [2042] 4 stressors started stress-ng: debug: [2045] icache: [2045] started (instance 2 on CPU 0) stress-ng: debug: [2046] icache: [2046] started (instance 3 on CPU 3) stress-ng: debug: [2045] caught SIGSEGV, address 0x0000000000000f04 (SEGV_MAPERR) stress-ng: debug: [2045] stress-ng: info: 0x0000000000000f00 not readable stress-ng: debug: [2045] stress-ng: info: 0x0000000000000f10 not readable stress-ng: debug: [2045] stress-ng: info: 0x0000000000000f20 not readable stress-ng: debug: [2043] caught SIGSEGV, address 0x0000000000000f04 (SEGV_MAPERR) stress-ng: debug: [2043] stress-ng: info: 0x0000000000000f00 not readable stress-ng: debug: [2043] stress-ng: info: 0x0000000000000f10 not readable stress-ng: debug: [2043] stress-ng: info: 0x0000000000000f20 not readable stress-ng: error: [2042] icache: [2043] terminated with an error, exit status=2 (stressor failed) stress-ng: debug: [2042] icache: [2043] terminated (stressor failed) stress-ng: debug: [2046] caught SIGSEGV, address 0x0000000000000f04 (SEGV_MAPERR) stress-ng: debug: [2046] stress-ng: info: 0x0000000000000f00 not readable stress-ng: debug: [2046] stress-ng: info: 0x0000000000000f10 not readable stress-ng: debug: [2046] stress-ng: info: 0x0000000000000f20 not readable
^C //After pressing Enter continuously without effect, I pressed Ctrl+C, and the program then continued.
stress-ng: debug: [2044] icache: [2044] exited (instance 1 on CPU 3) stress-ng: debug: [2042] icache: [2044] terminated (success) stress-ng: error: [2042] icache: [2045] terminated with an error, exit status=2 (stressor failed) stress-ng: debug: [2042] icache: [2045] terminated (stressor failed) stress-ng: error: [2042] icache: [2046] terminated with an error, exit status=2 (stressor failed) stress-ng: debug: [2042] icache: [2046] terminated (stressor failed) stress-ng: debug: [2042] metrics-check: all stressor metrics validated and sane stress-ng: info: [2042] skipped: 0 stress-ng: info: [2042] passed: 1: icache (1) stress-ng: info: [2042] failed: 3: icache (3) stress-ng: info: [2042] metrics untrustworthy: 0 stress-ng: info: [2042] unsuccessful run completed in 8.63 secs
OK, can you also comment out:
icache_func();
//(void)shim_cacheflush((char *)page, page_size, SHIM_ICACHE);
..rebuild and retest to see if the ICACHE flush is causing the issue.
shim_cacheflush
console:/data/local/tmp # /stress-ng --icache 8 -v < stress-ng: debug: [2847] invoked with './stress-ng --icache 8 -v' by user 0 stress-ng: debug: [2847] stress-ng 0.18.05 g1cb7016a5151 stress-ng: debug: [2847] system: Linux localhost 5.4.254+ #1 SMP PREEMPT Wed Oct 23 17:20:23 CST 2024 aarch64, clang 17.0.2, unknown libc version, little endian stress-ng: debug: [2847] RAM total: 3.0G, RAM free: 1.1G, swap free: 2.3G stress-ng: debug: [2847] temporary file path: '/data/local/tmp', filesystem type: f2fs (2187892 blocks available) stress-ng: debug: [2847] CPUs have 3 idle states: BUSY, WFI, cpu-sleep-0 stress-ng: debug: [2847] 4 processors online, 4 processors configured stress-ng: info: [2847] defaulting to a 1 day run per stressor stress-ng: debug: [2847] cache allocate: using defaults, cannot determine cache level details stress-ng: debug: [2847] cache allocate: shared cache buffer size: 2048K stress-ng: info: [2847] dispatching hogs: 8 icache stress-ng: debug: [2847] starting stressors stress-ng: debug: [2848] icache: [2848] started (instance 0 on CPU 3) stress-ng: debug: [2849] icache: [2849] started (instance 1 on CPU 2) stress-ng: debug: [2850] icache: [2850] started (instance 2 on CPU 0) stress-ng: debug: [2851] icache: [2851] started (instance 3 on CPU 2) stress-ng: debug: [2852] icache: [2852] started (instance 4 on CPU 0) stress-ng: debug: [2847] 8 stressors started stress-ng: debug: [2854] icache: [2854] started (instance 6 on CPU 1) stress-ng: debug: [2853] icache: [2853] started (instance 5 on CPU 1) stress-ng: debug: [2855] icache: [2855] started (instance 7 on CPU 1)
^C //After pressing Enter continuously without effect, I pressed Ctrl+C, and the program then continued.
^Cstress-ng: debug: [2850] icache: [2850] exited (instance 2 on CPU 0) stress-ng: debug: [2854] icache: [2854] exited (instance 6 on CPU 3) stress-ng: debug: [2849] icache: [2849] exited (instance 1 on CPU 2) stress-ng: debug: [2855] icache: [2855] exited (instance 7 on CPU 1) stress-ng: debug: [2852] icache: [2852] exited (instance 4 on CPU 0) stress-ng: debug: [2853] icache: [2853] exited (instance 5 on CPU 1) stress-ng: debug: [2851] icache: [2851] exited (instance 3 on CPU 2) stress-ng: debug: [2848] icache: [2848] exited (instance 0 on CPU 3) stress-ng: debug: [2847] icache: [2848] terminated (success) stress-ng: debug: [2847] icache: [2849] terminated (success) stress-ng: debug: [2847] icache: [2850] terminated (success) stress-ng: debug: [2847] icache: [2851] terminated (success) stress-ng: debug: [2847] icache: [2852] terminated (success) stress-ng: debug: [2847] icache: [2853] terminated (success) stress-ng: debug: [2847] icache: [2854] terminated (success) stress-ng: debug: [2847] icache: [2855] terminated (success) stress-ng: debug: [2847] metrics-check: all stressor metrics validated and sane stress-ng: info: [2847] skipped: 0 stress-ng: info: [2847] passed: 8: icache (8) stress-ng: info: [2847] failed: 0 stress-ng: info: [2847] metrics untrustworthy: 0 stress-ng: info: [2847] successful run completed in 4.63 secs
So this can be either one of two things:
I don't believe this is a stress-ng issue, I think this is an instruction cache flushing issue in the above function/system calls.
OK, I'm not an expert on the icache and I don't have any more findings. Anyway, thank you for your help and troubleshooting
If more testing and verification is needed, I'm happy to cooperate.
I can only suggest we add some debug in to the cacheflush shim function to see what's happening, in source core_shim.c in function shim_cacheflush() can you add the pr_inf() debug lines as shown below. The new debug code is after the / Add debug ... / comments:
#elif defined(HAVE_BUILTIN___CLEAR_CACHE)
/* More portable builtin */
(void)cache;
/* Add debug clear cache call */
pr_inf("__builtin___clear_cache(%p,%p)\n", (void *)addr, (void *)(addr + nbytes));
__builtin___clear_cache((void *)addr, (void *)(addr + nbytes));
return 0;
#elif defined(__NR_cacheflush) && \
defined(HAVE_SYSCALL)
/* potentially incorrect args, needs per-arch fixing */
/* Add debug cacheflush call */
pr_inf("cacheflush(%p,%d,%d)\n", (void *)addr, nbytes, cache);
return (int)syscall(__NR_cacheflush, addr, nbytes, cache);
#else
return (int)shim_enosys(0, addr, nbytes, cache);
#endif
Since this appears to be a kernel or arch specific issue and not a stress-ng issue, I'm going to close this. If you believe this is incorrect, please feel free to reopen this issue.
./stress-ng --icache 8192
stress-ng: info: [3703] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: info: [3719] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: info: [3771] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: info: [3802] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: info: [3828] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: info: [27138] skipped: 0 stress-ng: info: [27138] passed: 4734: icache (4734) stress-ng: info: [27138] failed: 3458: icache (3458) stress-ng: info: [27138] metrics untrustworthy: 0 stress-ng: info: [27138] unsuccessful run completed in 12.22 secs
why the icache be SIGSEGV ?
i used the arm64 Android system, Almost half of all icaches are reported incorrectly My system is 4-core,Changing different parameters to test still reported an error
`console:/data/local/tmp # ./stress-ng --icache 4 stress-ng: info: [3916] defaulting to a 1 day run per stressor stress-ng: info: [3916] dispatching hogs: 4 icache stress-ng: info: [3917] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: info: [3916] skipped: 0 stress-ng: info: [3916] passed: 3: icache (3) stress-ng: info: [3916] failed: 1: icache (1) stress-ng: info: [3916] metrics untrustworthy: 0 stress-ng: info: [3916] unsuccessful run completed in 0 secs
2|console:/data/local/tmp # ./stress-ng --icache 1 stress-ng: info: [3921] defaulting to a 1 day run per stressor stress-ng: info: [3921] dispatching hogs: 1 icache stress-ng: info: [3922] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: warn: [3921] metrics-check: all bogo-op counters are zero, data may be incorrect stress-ng: info: [3921] skipped: 0 stress-ng: info: [3921] passed: 0 stress-ng: info: [3921] failed: 1: icache (1) stress-ng: info: [3921] metrics untrustworthy: 0 stress-ng: info: [3921] unsuccessful run completed in 0 secs
2|console:/data/local/tmp # ./stress-ng --icache 2 -v stress-ng: debug: [3927] invoked with './stress-ng --icache 2 -v' by user 0 stress-ng: debug: [3927] stress-ng 0.18.05 ga808c8977db7 stress-ng: debug: [3927] system: Linux localhost 5.4.254+ #1 SMP PREEMPT Sat Sep 7 22:27:16 CST 2024 aarch64, clang 17.0.2, unknown libc version, little endian stress-ng: debug: [3927] RAM total: 3.0G, RAM free: 1.1G, swap free: 2.3G stress-ng: debug: [3927] temporary file path: '/data/local/tmp', filesystem type: f2fs (2177311 blocks available) stress-ng: debug: [3927] CPUs have 3 idle states: BUSY, WFI, cpu-sleep-0 stress-ng: debug: [3927] 4 processors online, 4 processors configured stress-ng: info: [3927] defaulting to a 1 day run per stressor stress-ng: debug: [3927] cache allocate: using defaults, cannot determine cache level details stress-ng: debug: [3927] cache allocate: shared cache buffer size: 2048K stress-ng: info: [3927] dispatching hogs: 2 icache stress-ng: debug: [3927] starting stressors stress-ng: debug: [3927] 2 stressors started stress-ng: debug: [3928] icache: [3928] started (instance 0 on CPU 0) stress-ng: debug: [3929] icache: [3929] started (instance 1 on CPU 1) stress-ng: info: [3928] stressor terminated with unexpected signal 11 'SIGSEGV' stress-ng: debug: [3927] icache: [3928] aborted via a termination signal stress-ng: debug: [3927] icache: [3928] terminated (killed by signal) stress-ng: debug: [3929] icache: [3929] exited (instance 1 on CPU 1) stress-ng: debug: [3927] icache: [3929] terminated (success) stress-ng: debug: [3927] metrics-check: all stressor metrics validated and sane stress-ng: info: [3927] skipped: 0 stress-ng: info: [3927] passed: 1: icache (1) stress-ng: info: [3927] failed: 1: icache (1) stress-ng: info: [3927] metrics untrustworthy: 0 stress-ng: info: [3927] unsuccessful run completed in 0 secs
`