clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
589 stars 70 forks source link

wshd can't run in clear container #342

Closed wuzhy closed 7 years ago

wuzhy commented 7 years ago

HI,

I tried to run wshd in clear container, but failed. Did anyone hit this issue before or know any way to workaround it? thanks.

Below was the log: sh-4.1# strace ./wshd --run /share execve("./wshd", ["./wshd", "--run", "/share"], [/ 8 vars /]) = 0 uname({sys="Linux", node="1f0511ff4ebce1762a45a656dff21e6f8ba01da5509c5d7c5572dbc9492a684e", ...}) = 0 brk(0) = 0x1eb1000 brk(0x1eb2180) = 0x1eb2180 arch_prctl(ARCH_SET_FS, 0x1eb1860) = 0 brk(0x1ed3180) = 0x1ed3180 brk(0x1ed4000) = 0x1ed4000 stat("/share", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 3 unlink("/share/wshd.sock") = 0 bind(3, {sa_family=AF_FILE, path="/share/wshd.sock"}, 110) = -1 ENXIO (No such device or address) dup(2) = 4 fcntl(4, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat(4, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb6f6323000 lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) write(4, "bind: No such device or address\n", 32bind: No such device or address ) = 32 close(4) = 0 munmap(0x7fb6f6323000, 4096) = 0 brk(0x1ed3000) = 0x1ed3000 exit_group(1) = ? sh-4.1# sh-4.1# uname -a Linux 1f0511ff4ebce1762a45a656dff21e6f8ba01da5509c5d7c5572dbc9492a684e 4.5.0-50.container #1 SMP Mon Oct 24 22:24:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux sh-4.1# cat /etc/system-release CentOS release 6.4 (Final) sh-4.1#

grahamwhaley commented 7 years ago

Hi @wuzhy we've not tried wshd I'm afraid, so have no direct experience. Seeing:

unlink("/share/wshd.sock") = 0

in the logs makes me think of an issue of 9pfs with unlinked files (see https://github.com/01org/cc-oci-runtime/issues/152) for a similar thread.

In that case, we verified the issue by using a tmpfs on /tmp to see if that worked around the problem:

# mount -t ramfs -o size=20M ramfs /tmp

I don't see how /share is placed inside your container, but my guess is it has been mounted via a docker command request, and thus is a 9p mount. Can you check that for us with a mount command. And if possible, maybe you can verify using a similar tmpfs trick above?

wuzhy commented 7 years ago

@grahamwhaley Yes, the dir /share is mounted via docker run ... -v /root/wsh/share:/share ....

hyperShared on /share type 9p (rw,sync,dirsync,nodev,relatime,trans=virtio) sh-4.1# ls /share wsh wsh.tar.gz wshd sh-4.1# sh-4.1# mount -t ramfs -o size=20M ramfs /share sh-4.1# mount hyperShared on /share type 9p (rw,sync,dirsync,nodev,relatime,trans=virtio) ramfs on /share type ramfs (rw,relatime,size=20M) sh-4.1# ls /share

The original files or directory in /share can't seen by us now.

sboeuf commented 7 years ago

@wuzhy what do you mean by "now" ? You have seen a recent change ?

wuzhy commented 7 years ago

@sboeuf i mean that after the command "mount -t ramfs -o size=20M ramfs /share" is done, there is no files or directory under the dir "/share".

wuzhy commented 7 years ago

@grahamwhaley yes, it can workaround with the help of your mentioned method. But has this 9p issue not still been fixed by upstream?

sh-4.1# strace ./wshd --run /share execve("./wshd", ["./wshd", "--run", "/share"], [/ 8 vars /]) = 0 uname({sys="Linux", node="c58b86197c50fe5a26184bee4ab27928cc74575ae9360e347c6d712e89c39432", ...}) = 0 brk(0) = 0x16c7000 brk(0x16c8180) = 0x16c8180 arch_prctl(ARCH_SET_FS, 0x16c7860) = 0 brk(0x16e9180) = 0x16e9180 brk(0x16ea000) = 0x16ea000 stat("/share", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 socket(PF_FILE, SOCK_STREAM, 0) = 3 unlink("/share/wshd.sock") = 0 bind(3, {sa_family=AF_FILE, path="/share/wshd.sock"}, 110) = 0 listen(3, 5) = 0 close(0) = 0 close(1) = 0 close(2) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 signalfd4(-1, [CHLD], 8, O_NONBLOCK|O_CLOEXEC) = 0 select(1024, [0 3], NULL, NULL, NULL

dlespiau commented 7 years ago

Is your plan to use an AF_UNIX socket shared between the host and guest? I can't see that working with two different kernels (host and guest). The guest kernel doesn't know about the kernel objects from the host host. It works with containers because they share the same kernel.

Or I am missing something?

wuzhy commented 7 years ago

Yes,we use wshd to communicate between guest and host. any other way to replace it?

wuzhy commented 7 years ago

This issue got fixed, So close it now