Open fhs opened 4 years ago
p9p factotum definitely contains a Heisenbug. On FreeBSD, "factotum -ndD" hangs after "rpc start". The answer "ok" is not sent. Whe I remove the treadcreate for convthread, I got the "ok", but the next step hangs again. Now I tried running factotum under valgrind, and: the same program just works as expected.
The valgrind log is:
==1217== Using Valgrind-3.17.0.GIT and LibVEX; rerun with -h for copyright info
==1217== Command: factotum -n
==1217==
==1217== Warning: ignored attempt to set SIGSTOP handler in sigaction();
==1217== the SIGSTOP signal is uncatchable
==1217== Conditional jump or move depends on uninitialised value(s)
==1217== at 0x486FD93: ??? (in /lib/libthr.so.3)
==1217== by 0x241609: _threadsetupdaemonize (daemonize.c:147)
==1217== by 0x2406A0: p9main (thread.c:840)
==1217== by 0x24D118: main (main.c:10)
==1217==
==1217==
factotum seems to be failing to do p9sk1 correctly most of the time. For example, when trying to connect to u9fs (running with
-a p9any
), it succeeds only 7/20 times here:I was also able to reproduce the issue trying to connect to fossil serving 9P. I was NOT able reproduce the issue with native Plan 9 factotum. The code also looks very different compared to p9p factotum. The issue only becomes apparent on Plan 9 when I
bind /mnt/term/mnt/factotum /mnt/factotum
in drawterm.When authentication fails, factotum 9P protocol trace shows this:
Digging a bit deeper, it looks like factotum p9sk1 client (in
convthread
) is inwrite ticket+auth
state and settingc->rpc.op = RpcUnknown
inrpc.c:/^rpcrespondn
. Then,fs.c:/^fsread
(infsreqthread
thread) is checking ifc->rpc.op == RpcUnknown
and thus fails withno rpc pending
. TheConv *c
is shared between the two threads. Shouldn't there be a lock?