Closed lucasclucasdo closed 6 years ago
Still seeing the same issue. Using your branch with the commits
commit aee8ca9f0a85562a9eb8007ba9b41f39204e79b8 Author: Lucas Crowthers lucasc.qdt@qualcommdatacenter.com Date: Mon May 14 17:21:34 2018 +0000
Lockhammer: Assign thread affinity prior to creation
Set thread affinity to a particular hardware thread prior to thread
creation. This creates a potential disconnect between the start
order and the cpu on which the thread is running necessitating some
changes to the synchronized start code and the addition of a core
number thread argument in order to correctly index per-cpu variables
inside the locks algorithms.
Fixes #16
Change-Id: Ia27d70e7d2875637a4ef1514e61636fc1b662698
commit 12b45f8fa16534a217301bddc2a1aa56a9748a1a Author: Lucas Crowthers lucasc.qdt@qualcommdatacenter.com Date: Mon May 14 16:53:33 2018 +0000
Lockhammer: Add arbitrary core interleave argument
---- added -i4 as per your suggestion --- diff --git a/benchmarks/lockhammer/scripts/sweep.sh b/benchmarks/lockhammer/scri index f90d534..50e6d9d 100755 --- a/benchmarks/lockhammer/scripts/sweep.sh +++ b/benchmarks/lockhammer/scripts/sweep.sh @@ -48,7 +48,7 @@ do fi
echo Test: ${1} CPU: exectx=$c Date: `date` 1>&2
Got kernel call trace...
Test: ticket_spinlock CPU: exectx=120 Date: Tue May 15 01:26:11 EDT 2018
399960 lock loops
149354365575 ns scheduled
1269883909 ns elapsed (~117.612610 cores)
373423.256263 ns per access
3175.027275 ns access rate
107.847190 average depth
Test: ticket_spinlock CPU: exectx=128 Date: Tue May 15 01:26:17 EDT 2018
[ 5923.382537] INFO: task kworker/u449:6:2190 blocked for more than 120 seconds.
[ 5923.389664] Not tainted 4.14.31-24.cavium.ml.aarch64 #1
[ 5923.395409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5923.403231] kworker/u449:6 D 0 2190 2 0x00000020
[ 5923.408723] Workqueue: writeback wb_workfn (flush-8:0)
[ 5923.413855] Call trace:
[ 5923.416299] [
setting echo 0 > /proc/sys/kernel/hung_task_timeout_secs, test still hangs
Would you mind trying a couple of things?
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
Generally this is a very bad idea (tm) but lockhammer depends on being able to schedule as many cores as requested in order to give accurate results and should spend very little time actually running once all requested cores are scheduled.
Important Edit: I should probably point out that if doing the above doesn't help it'll probably hurt even more to the point that you might have to power cycle the system under test so don't try that if doing so isn't an option.
As per the code, the child thread 0 started by main() spend most of the time in wait, before starting actual test code. When running with > 200 cores, it is observed that main() thread's pthread_create gets blocked and 99% of cpu is used by ldarx in wait in childh thread0.
Did a small hack test, with moving child thread 0 to Core 1 and child thread 1 to Core 2. runall.sh completed fully with this patch
https://github.com/mjaggi-cavium/lockhammer/commit/a230d1cd18359c45235971157e948f6343927a95
Often on systems with multiple hardware threads the logical core numbering is assigned such that lower numbered cores reference the first hardware thread on each core and higher numbered cores reference subsequent hardware threads on those same cores. For example, on a 4-core 8-thread system logical core 0 references the first thread on physical core 0, logical core 1 references the first thread on physical core 1, logical core 4 references the second thread on physical core 0, and so on. Currently lockhammer adjusts its core population to fill up all threads on each physical cores first since these but does so in a way that only works correctly for 2 thread-per-core systems. A generic mechanism for specifying the number of "regions" of logical core numbering should be added and a way to specify the correct value on the command line provided.