Open garlick opened 4 years ago
Just focusing in on t05
which is the first failing test, we do get a hang on kernel 5.10.63.
The test script only runs /bin/true
and as expected, it has succeeded.
Test output t05.out
:
kconjoin: diodmount exited with rc=0
kconjoin: t05 exited with rc=0
and diod log t05.diod
contains
diod: P9_TVERSION tag 65535 msize 65536 version '9P2000.L'
diod: P9_RVERSION tag 65535 msize 65536 version '9P2000.L'
diod: P9_TAUTH tag 0 afid 0 uname '' aname '/tmp/tmp.vBZM9TQ04J' n_uname 0
diod: P9_RLERROR tag 0 ecode 2
diod: P9_TATTACH tag 0 fid 0 afid -1 uname '' aname '/tmp/tmp.vBZM9TQ04J' n_uname 0
diod: P9_RATTACH tag 0 qid (000000000001fcac 0 'd')
diod: P9_TCLUNK tag 0 fid 0
diod: P9_RCLUNK tag 0
diod: P9_TVERSION tag 65535 msize 65536 version '9P2000.L'
diod: P9_RVERSION tag 65535 msize 65536 version '9P2000.L'
diod: P9_TATTACH tag 0 fid 0 afid -1 uname 'root' aname '/tmp/tmp.vBZM9TQ04J' n_uname P9_NONUNAME
diod: P9_RATTACH tag 0 qid (000000000001fcac 0 'd')
diod: P9_TGETATTR tag 0 fid 0 request_mask 0x7ff
diod: P9_RGETATTR tag 0 valid 0x7ff qid (000000000001fcac 0 'd') mode 040755 uid 0 gid 0 nlink 2 rdev 0 size 4096 blksize 4096 blocks 8 atime Tue Oct 26 16:52:16 2021 mtime Tue Oct 26 16:52:16 2021 ctime Tue Oct 26 16:52:16 2021 btime X gen X data_version X
gdb says kconjoin is stuck here:
(gdb) bt
#0 0xb6ddb2a8 in __GI___waitpid (pid=pid@entry=18484,
stat_loc=stat_loc@entry=0xbee1d218, options=options@entry=0)
at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1 0xb6d744ec in do_system (
line=line@entry=0xbee1d652 "../../diod/diod -r80 -w81 -c /dev/null -n -d 1 -L t05.diod -e /tmp/tmp.vBZM9TQ04J") at ../sysdeps/posix/system.c:149
#2 0xb6d749c4 in __libc_system (
line=line@entry=0xbee1d652 "../../diod/diod -r80 -w81 -c /dev/null -n -d 1 -L t05.diod -e /tmp/tmp.vBZM9TQ04J") at ../sysdeps/posix/system.c:185
#3 0x00010a40 in main (argc=<optimized out>, argv=<optimized out>)
at kconjoin.c:133
and diod here
(gdb) bt
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x1f52ffc)
at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x0, cond=0x1f52fd0)
at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x1f52fd0, mutex=0x0) at pthread_cond_wait.c:655
#3 0x00028bec in np_srv_wait_conncount (srv=0x1f52f18, count=count@entry=1)
at srv.c:141
#4 0x00012dbc in _service_run (wfdno=-1227925284, rfdno=<optimized out>,
mode=SRV_FILEDES) at diod.c:666
#5 main (argc=<optimized out>, argv=<optimized out>) at diod.c:257
which is this function:
/* Block the caller until the server has no active connections,
* and there have been at least 'count' connections historically.
*/
void
np_srv_wait_conncount(Npsrv *srv, int count)
{
xpthread_mutex_lock(&srv->lock);
while (srv->conncount > 0 || srv->connhistory < count) {
xpthread_cond_wait(&srv->conncountcond, &srv->lock);
}
xpthread_mutex_unlock(&srv->lock);
}
the connection count is 1
(gdb) frame 3
#3 0x00028bec in np_srv_wait_conncount (srv=0x1f52f18, count=count@entry=1)
at srv.c:141
141 xpthread_cond_wait(&srv->conncountcond, &srv->lock);
(gdb) p srv->conncount
$1 = 1
So the kernel does not clunk the mount when the test program completes.
The private namespace established with CLONE_NEWNS appears to be leaking, since it is visible to all in /proc/mounts:
$ cat /proc/mounts|grep 9p
nohost:/tmp/tmp.YRvf1AVI4r /tmp/tmp.kbqx8vsreA 9p rw,sync,dirsync,relatime,debug=1,uname=root,aname=/tmp/tmp.YRvf1AVI4r,access=user,msize=65536,trans=fd,rfd=80,wfd=81 0 0
sudo umount /tmp/tmp.kbqx8vsreA
allows the test to complete successfully.
This seems to resolve the issue.
diff --git a/tests/kern/kconjoin.c b/tests/kern/kconjoin.c
index 83f08b0..4e32342 100644
--- a/tests/kern/kconjoin.c
+++ b/tests/kern/kconjoin.c
@@ -114,6 +114,13 @@ main (int argc, char *argv[])
_movefd (fromsrv[0], RFDNO);
if (unshare (CLONE_NEWNS) < 0)
err_exit ("unshare");
+ /* Change root propagation to private within this namespace,
+ * as systemd may have mounted root with it set to shared,
+ * and then the 9p mount will leak into the main namespace and
+ * not be automatically unmounted when the test completes.
+ */
+ system ("mount --make-private /");
+
if ((cs = system (mntcmd)) == -1)
err_exit ("failed to run %s", _cmd (mntcmd));
if (_interpret_status (cs, _cmd (mntcmd)))
Still some cleanup issues with that fix applied. After running the test I get
$ df
df: /tmp/tmp.vExWhjHnlw: Input/output error
df: /tmp/tmp.xsXEoaK3x6: Input/output error
df: /tmp/tmp.BUHhy3kS6C: Input/output error
df: /tmp/tmp.EVMsv8hhgf: Input/output error
df: /tmp/tmp.orlCjxATgT: Input/output error
df: /tmp/tmp.F8bpFfCXZL: Input/output error
df: /tmp/tmp.REyd5kqrJh: Input/output error
df: /tmp/tmp.wF4OUnufwA: Input/output error
df: /tmp/tmp.cMaEsks0sl: Input/output error
df: /tmp/tmp.AaFdxFdXw2: Input/output error
df: /tmp/tmp.Hx98fIqnQH: Input/output error
df: /tmp/tmp.rTdvkUeHyg: Input/output error
df: /tmp/tmp.SGGAtLbvHa: Input/output error
df: /tmp/tmp.iii51W8iiM: Input/output error
df: /tmp/tmp.8NZQKsmVQj: Input/output error
df: /tmp/tmp.FavfPS4Ffx: Input/output error
df: /tmp/tmp.ouYPdkmvU5: Input/output error
df: /tmp/tmp.LLXGbJAnaN: Input/output error
Running
make check
as root down in tests/kern hangs at testt05
. My kernel is5.4.0-7634-generic
(ubuntu 20.04 LTS).This may be a dup of #23 which was against linux-next in 2015, but wanted to open up a new bug until that is confirmed.