Closed dancrossnyc closed 4 years ago
There's no parallelism in this benchmark, because the test is using threadcreate instead of proccreate.
Fair point, though the change in run time does suggest the type of lock has an effect.
Further, changing my benchmark to use proccreate
instead of threadcreate
yields similar results, with the mutex being substantially faster than the reader/writer lock on the platforms I tested (macOS, Linux, FreeBSD). Indeed, on Linux, Lock was twice as fast as RWLock. However, with the parallelism of proccreate
, tests run substantially longer on all platforms.
I'll update the commit message with new data.
My system is nothing like your system, apparently (MacBook Pro). For me, the QLock is 10X slower real time.
r$ cat x.c
#include <u.h>
#include <libc.h>
#include <thread.h>
void
thr(void *arg)
{
Channel *c = (Channel *)arg;
char buf[128];
int i;
for(i=0; i<10000; i++)
snprint(buf, sizeof(buf), "%s%04d/%02d/%02d%s%s%s%s",
"hi", 2020, 1, 1, "happy", "new", "year", "!");
sendul(c, 0);
threadexits(0);
}
enum {
NTHREADS = 16,
STACK = 4096,
};
void
threadmain(int argc, char *argv[])
{
(void)argc;
(void)argv;
Channel *chans[NTHREADS];
int i;
for(i=0; i<NTHREADS; i++){
chans[i] = chancreate(sizeof(ulong), 1);
proccreate(thr, chans[i], STACK);
}
for(i=0; i<NTHREADS; i++){
(void)recvul(chans[i]);
chanfree(chans[i]);
}
threadexitsall(0);
}
r$ cat src/lib9/fmtlock2.c
#include <u.h>
#include <libc.h>
static RWLock fmtlock;
void
__fmtrlock(void)
{
rlock(&fmtlock);
}
void
__fmtrunlock(void)
{
runlock(&fmtlock);
}
void
__fmtwlock(void)
{
wlock(&fmtlock);
}
void
__fmtwunlock(void)
{
wunlock(&fmtlock);
}
r$ (cd src/lib9 && mk)
mk: 'default' is up to date
r$ 9c x.c && 9l x.o && 9 time ./a.out && 9 time ./a.out && 9 time ./a.out
0.50u 4.15s 0.40r ./a.out
0.51u 4.35s 0.42r ./a.out
0.50u 4.25s 0.41r ./a.out
r$
OK, make change to QLock:
r$ git diff src/lib9
diff --git a/src/lib9/fmtlock2.c b/src/lib9/fmtlock2.c
index b755daa3..0dc81651 100644
--- a/src/lib9/fmtlock2.c
+++ b/src/lib9/fmtlock2.c
@@ -1,28 +1,28 @@
#include <u.h>
#include <libc.h>
-static RWLock fmtlock;
+static QLock fmtlock;
void
__fmtrlock(void)
{
- rlock(&fmtlock);
+ qlock(&fmtlock);
}
void
__fmtrunlock(void)
{
- runlock(&fmtlock);
+ qunlock(&fmtlock);
}
void
__fmtwlock(void)
{
- wlock(&fmtlock);
+ qlock(&fmtlock);
}
void
__fmtwunlock(void)
{
- wunlock(&fmtlock);
+ qunlock(&fmtlock);
}
r$ (cd src/lib9 && mk)
9c fmtlock2.c
9ar rsc $PLAN9/lib/lib9.a fmtlock2.o
r$ 9c x.c && 9l x.o && 9 time ./a.out && 9 time ./a.out && 9 time ./a.out
1.98u 3.49s 4.32r ./a.out
2.08u 3.74s 4.60r ./a.out
2.14u 3.86s 4.73r ./a.out
r$
I'm not using a QLock, I'm using a spinlock (Lock, sans Q). Am I wrong?
Hmm; reading the man page, maybe I am wrong. It says this:
Although Locks are the more primitive lock, they have limi-
tations; for example, they cannot synchronize between tasks
in the same proc. Use QLocks instead.
Hmm...and on re-reading the BUGS section, it's not clear to me that the above is actually an issue on p9p.
Please drop this. I had the QLock/Lock version installed after that test and got an acme deadlock out of it. I am going to get rid of any locking on the fast path again.
Eek. That sounds weird, but ok.
The problem is that the lock - any lock - changes calls to print from being signal-safe to being signal-unsafe. That path needs to be non-blocking.
The use of reader/writer locks was actually a pessimization, as shown by the following benchmark and collected data:
Benchmark program:
-->CUT<--
-->CUT<--
===On Debian Linux 5.2.17/x86_64 (9c bench.c && 9l -o bench bench.o)=== (HP Z440, about a year old; Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz)
With RWLock:
**Best time over six runs:
: spitfire; time ./bench
real 0m9.208s user 0m0.000s sys 0m0.003s : spitfire;
Raw times: 9.209, 9.193, 9.454, 9.450, 9.436, 9.208 avg 9.325, med 9.323, std dev +/-.134
With Lock (simple mutex):
**Best time over six runs:
: spitfire; time ./bench
real 0m6.802s user 0m0.002s sys 0m0.001s : spitfire;
Raw times: 6.814, 6.808, 6.808, 6.802, 6.817, 6.802 avg 6.809, med 6.808, std dev +/-.006
===On macOS Mojave/x86_64 (9c bench.c && 9l -o bench bench.o)=== (2013 Mac Pro, about two years old; 3.5 GHz 6-Core Intel Xeon E5)
With RWLock:
**Best time over six runs:
: tempest; time ./bench
real 0m11.753s user 0m0.001s sys 0m0.002s : tempest;
Raw run times: 11.820, 11.865, 11.776, 11.766, 11.753, 11.852 avg 11.805, med 11.798, std dev +/-.047
With Lock:
**Best time over six runs:
: tempest; time ./bench
real 0m8.818s user 0m0.001s sys 0m0.002s : tempest;
Raw run times: 8.828, 8.825, 8.818, 8.883, 8.934, 8.863 avg 8.859, med 8.846, std dev +/-.045
===On FreeBSD 12.1/x86_64 (9c bench.c && 9l -o bench bench.o -lpthread)=== (VPS; Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (2295.80-MHz K8-class CPU))
With RWLock:
**Best time over six runs:
: pingala; time ./bench ./bench 0.00s user 0.00s system 0% cpu 24.690 total : pingala;
Raw run times: 25.693, 26.178, 24.866, 25.392, 25.690, 25.881 avg 25.617, med 25.692, std dev +/-.449
With Lock:
**Best time over six runs:
: pingala; time ./bench ./bench 0.00s user 0.00s system 0% cpu 15.164 total : pingala;
Raw run times: 15.688, 15.400, 16.123, 15.164, 15.969, 16.256 avg 15.767, med 15.829, std dev +/-.427