Closed mator closed 3 years ago
It may be worth replacing __process_madvise with __NR_process_madvise on that failed build to see if that helps with the build failure so you can continue with the bisect without the need to skip
I've been running this on a sparc64 debian QEMU installation with a 4.15 kernel and noticed that sometimes fork() returns the wrong PID and the PID matches that one the existing stressor PID and this gets killed causing the premature stop of the stress test. This only happens to me when I run with 2 or more stressors. It's most bizarre.
I've pushed a fix for the issues I'm seeing on SPARC64. Perhaps you could pull the latest tip and rebuild and let me know if this helps. What kernel are you using?
@ColinIanKing seems like fixed:
$ git desc
V0.12.06-20-g3466c47c
$ ./stress-ng -v --fork 2 --timeout 10s --metrics-brief
stress-ng: debug: [170539] system: Linux ttip 5.12.0-rc5 #204 SMP Mon Mar 29 10:19:44 MSK 2021 sparc64
stress-ng: debug: [170539] RAM total: 33.5G, RAM free: 31.8G, SWAP free: 768.7M
stress-ng: debug: [170539] 24 processors online, 256 processors configured
stress-ng: info: [170539] dispatching hogs: 2 fork
stress-ng: debug: [170539] cache allocate: using defaults, can't determine cache details from sysfs
stress-ng: debug: [170539] cache allocate: default cache size: 2048K
stress-ng: debug: [170539] starting stressors
stress-ng: debug: [170539] 2 stressors started
stress-ng: debug: [170540] stress-ng-fork: started [170540] (instance 0)
stress-ng: debug: [170541] stress-ng-fork: started [170541] (instance 1)
stress-ng: debug: [170540] stress-ng-fork: exited [170540] (instance 0)
stress-ng: debug: [170541] stress-ng-fork: exited [170541] (instance 1)
stress-ng: debug: [170539] process [170540] terminated
stress-ng: debug: [170539] process [170541] terminated
stress-ng: info: [170539] successful run completed in 10.00s
stress-ng: info: [170539] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [170539] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [170539] fork 20208 10.00 13.09 6.39 2020.79 1037.37
stress-ng: debug: [170539] metrics-check: all stressor metrics validated and sane
Thanks.
PS: This https://github.com/strace/strace/commit/c4cff2a7a66629bd95fda9bada84a639c59cda3c could probably explain some sparc64 fork issues....
Yep, that strace explains it. I was using syscall(__NR_fork) on a random set of the fork calls, hence getting the parent pid and that explains the random killing of the parent stressor. Doh.
Hello!
I'm trying to bisect a fork issue on sparc64 platform which was introduced recently (?)... currently it looks like this:
a "good" version looks like this:
so far, my bisect log look like this:
where
git bisect skip
is used when unable to compile stress-ng with the following error (random non-compilable commit id eb910081 ):going to finish bisecting, but i wonder - bisect would end up in just compilable version and not the actual git commit with fork issue... any advice so far?
PS: bisect ended up in