ARM-software / synchronization-benchmarks

Collection of synchronization micro-benchmarks and traces from infrastructure applications
Other
38 stars 36 forks source link

Add a core-number interleaving command line option #17

Closed lucasclucasdo closed 6 years ago

lucasclucasdo commented 6 years ago

Often on systems with multiple hardware threads the logical core numbering is assigned such that lower numbered cores reference the first hardware thread on each core and higher numbered cores reference subsequent hardware threads on those same cores. For example, on a 4-core 8-thread system logical core 0 references the first thread on physical core 0, logical core 1 references the first thread on physical core 1, logical core 4 references the second thread on physical core 0, and so on. Currently lockhammer adjusts its core population to fill up all threads on each physical cores first since these but does so in a way that only works correctly for 2 thread-per-core systems. A generic mechanism for specifying the number of "regions" of logical core numbering should be added and a way to specify the correct value on the command line provided.

mjaggi-cavium commented 6 years ago

Still seeing the same issue. Using your branch with the commits

commit aee8ca9f0a85562a9eb8007ba9b41f39204e79b8 Author: Lucas Crowthers lucasc.qdt@qualcommdatacenter.com Date: Mon May 14 17:21:34 2018 +0000

Lockhammer: Assign thread affinity prior to creation

Set thread affinity to a particular hardware thread prior to thread
creation.  This creates a potential disconnect between the start
order and the cpu on which the thread is running necessitating some
changes to the synchronized start code and the addition of a core
number thread argument in order to correctly index per-cpu variables
inside the locks algorithms.

Fixes #16

Change-Id: Ia27d70e7d2875637a4ef1514e61636fc1b662698

commit 12b45f8fa16534a217301bddc2a1aa56a9748a1a Author: Lucas Crowthers lucasc.qdt@qualcommdatacenter.com Date: Mon May 14 16:53:33 2018 +0000

Lockhammer: Add arbitrary core interleave argument

---- added -i4 as per your suggestion --- diff --git a/benchmarks/lockhammer/scripts/sweep.sh b/benchmarks/lockhammer/scri index f90d534..50e6d9d 100755 --- a/benchmarks/lockhammer/scripts/sweep.sh +++ b/benchmarks/lockhammer/scripts/sweep.sh @@ -48,7 +48,7 @@ do fi

            echo Test: ${1} CPU: exectx=$c Date: `date` 1>&2

Got kernel call trace...

Test: ticket_spinlock CPU: exectx=120 Date: Tue May 15 01:26:11 EDT 2018 399960 lock loops 149354365575 ns scheduled 1269883909 ns elapsed (~117.612610 cores) 373423.256263 ns per access 3175.027275 ns access rate 107.847190 average depth Test: ticket_spinlock CPU: exectx=128 Date: Tue May 15 01:26:17 EDT 2018 [ 5923.382537] INFO: task kworker/u449:6:2190 blocked for more than 120 seconds. [ 5923.389664] Not tainted 4.14.31-24.cavium.ml.aarch64 #1 [ 5923.395409] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5923.403231] kworker/u449:6 D 0 2190 2 0x00000020 [ 5923.408723] Workqueue: writeback wb_workfn (flush-8:0) [ 5923.413855] Call trace: [ 5923.416299] [] switch_to+0x8c/0xa8 [ 5923.421429] [] __schedule+0x278/0x7e0 [ 5923.426648] [] schedule+0x34/0x8c [ 5923.431524] [] io_schedule+0x1c/0x38 [ 5923.436657] [] lock_page+0x114/0x15c [ 5923.441969] [] write_cache_pages+0x450/0x4c4 [ 5923.448057] [] xfs_vm_writepages+0xc4/0xf0 [xfs] [ 5923.454238] [] do_writepages+0x30/0x98 [ 5923.459539] [] writeback_single_inode+0x5c/0x3f4 [ 5923.465885] [] writeback_sb_inodes+0x27c/0x4b8 [ 5923.471880] [] writeback_inodes_wb+0xa4/0xe8 [ 5923.477879] [] wb_writeback+0x21c/0x348 [ 5923.483272] [] wb_workfn+0x298/0x418 [ 5923.488404] [] process_one_work+0x16c/0x380 [ 5923.494142] [] worker_thread+0x60/0x40c [ 5923.499535] [] kthread+0x10c/0x138 [ 5923.504493] [] ret_from_fork+0x10/0x18 [ 5923.509794] INFO: task auditd:2892 blocked for more than 120 seconds. [ 5923.516226] Not tainted 4.14.31-24.cavium.ml.aarch64 #1 [ 5923.521958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5923.529780] auditd D 0 2892 1 0x00000000 [ 5923.535260] Call trace: [ 5923.537695] [] __switch_to+0x8c/0xa8

mjaggi-cavium commented 6 years ago

setting echo 0 > /proc/sys/kernel/hung_task_timeout_secs, test still hangs

lucasclucasdo commented 6 years ago

Would you mind trying a couple of things?

  1. Try using -i 1 and only running up to the number of physical cores on the system. This should populate physical cores first and not schedule any secondary hardware threads on the same physical core. If this works it may be the best general way to handle hyperthreaded systems since hyperthreading also confuses the busy spinning used to simulate critical times
  2. It's possible that RT throttling is purposefully holding back scheduling threads on the last few hardware threads in order to prevent loss of responsiveness. Counter-intuitively this could be causing the already-scheduled threads to occupy CPU time for much much longer than necessary actually hurting responsiveness. lockhammer is designed to complete the test very rapidly so it might help to play with the RT throttling settings, specifically you might try:

echo -1 > /proc/sys/kernel/sched_rt_runtime_us

Generally this is a very bad idea (tm) but lockhammer depends on being able to schedule as many cores as requested in order to give accurate results and should spend very little time actually running once all requested cores are scheduled.

Important Edit: I should probably point out that if doing the above doesn't help it'll probably hurt even more to the point that you might have to power cycle the system under test so don't try that if doing so isn't an option.

mjaggi-cavium commented 6 years ago

As per the code, the child thread 0 started by main() spend most of the time in wait, before starting actual test code. When running with > 200 cores, it is observed that main() thread's pthread_create gets blocked and 99% of cpu is used by ldarx in wait in childh thread0.

Did a small hack test, with moving child thread 0 to Core 1 and child thread 1 to Core 2. runall.sh completed fully with this patch

https://github.com/mjaggi-cavium/lockhammer/commit/a230d1cd18359c45235971157e948f6343927a95