clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
590 stars 70 forks source link

Clearlinux: Clear Container 3.0 fails to setup networking on Clearlinux #667

Open mcastelino opened 6 years ago

mcastelino commented 6 years ago

Description of problem

docker run -itd alpine sh

Expected result

Container up with networking

Actual result

docker run -itd alpine sh
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
88286f41530e: Pull complete
Digest: sha256:f006ecbb824d87947d0b51ab8488634bf69fe4094959d935c0c103f4820a417d
Status: Downloaded newer image for alpine:latest
27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66
docker: Error response from daemon: oci runtime error: Could not get veth interface: Incorrect link type gre, expecting veth.

Runtime logs

2017-10-02 15:19:07.008133809 -0700 PDT m=+0.001121649:16150:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit ) called as: [create --bundle /var/run/docker/libcontainerd/27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66 --console /dev/pts/0 --pid-file /run/docker/libcontainerd/containerd/27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66/init/pid 27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66]
2017-10-02 15:19:07.008381648 -0700 PDT m=+0.001369524:16150:cc-runtime:info:Using configuration file "/usr/share/defaults/clear-containers/configuration.toml"
2017-10-02 15:19:07.009681727 -0700 PDT m=+0.002669549:16150:cc-runtime:info:No sockets from configuration
2017-10-02 15:19:07.009851901 -0700 PDT m=+0.002839721:16150:cc-runtime:info:Device details for container 27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66: Major:0, Minor:44, MountPoint:/var/lib/docker/overlay/bc511cc292a79cdc00b1d680ce267547e2da0b6f95fcc5d7219a226c107cfbbf/merged
2017-10-02 15:19:07.070899403 -0700 PDT m=+0.063887258:16150:cc-runtime:error:Could not get veth interface: Incorrect link type gre, expecting veth
Could not get veth interface: Incorrect link type gre, expecting veth
2017-10-02 15:19:07.073439752 -0700 PDT m=+0.000967428:16183:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit ) called as: [delete 27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66]
2017-10-02 15:19:07.073643172 -0700 PDT m=+0.001170877:16183:cc-runtime:info:Using configuration file "/usr/share/defaults/clear-containers/configuration.toml"
2017-10-02 15:19:07.075847408 -0700 PDT m=+0.003375132:16183:cc-runtime:error:open /run/virtcontainers/pods/27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66/network.json: no such file or directory
2017-10-02 15:19:07.096898655 -0700 PDT m=+0.001003093:16188:cc-runtime:info:cc-runtime (version 3.0.0-beta.2, commit ) called as: [delete 27b8436f3f8ece3ff4a9a8db2db6d83eab57518e9749e316c7b0e9d5baf11e66]
2017-10-02 15:19:07.097082209 -0700 PDT m=+0.001186593:16188:cc-runtime:info:Using configuration file "/usr/share/defaults/clear-containers/configuration.toml"
2017-10-02 15:19:07.098754188 -0700 PDT m=+0.002858603:16188:cc-runtime:error:Can not move from stopped to stopped

The netns is actually setup

ip netns
cni-7492e23e-e2c8-da73-ecae-614bb32c7212 (id: 1)

The docker logs show the networking pre-hook being called

dockerd[16638]: time="2017-10-02T15:38:12.983629066-07:00" level=error msg="Create container failed with error: oci runtime error: Could not get veth interface: Incorrect link type gre, expecting veth"

dockerd[16638]: time="2017-10-02T15:38:12.953281889-07:00" level=debug msg="sandbox set key processing took 54.546492ms for container ae70344d8715b72f2787c290519aec1b7b98d0073a3e6c54ba7f6c40c0d4cc6c"

However it looks like the interface was setup either in the wrong namespace or we scanned the wrong namespace on hook return

Note: Also QEMU is left running even though the container was never launched

mcastelino commented 6 years ago
docker version
Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.9
 Git commit:   7392c3b0ce0f9d3e918a321c66668c5d1ef4f689
 Built:        Wed Sep 20 09:30:15 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.9
 Git commit:   7392c3b0ce0f9d3e918a321c66668c5d1ef4f689
 Built:        Wed Sep 20 09:30:15 2017
 OS/Arch:      linux/amd64
 Experimental: false
cc-runtime version
cc-runtime  : 3.0.0-beta.2
   commit   : <<unknown>>
   OCI specs: 1.0.0-rc5
mcastelino commented 6 years ago

Note. Docker is built using a different version of go in the case of docker supplied packages

docker version
Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:23:42 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:25:02 2017
 OS/Arch:      linux/amd64
 Experimental: false
mcastelino commented 6 years ago

Docker is actually creating the interfaces per dmesg. Looks like it is creating in a wrong ns or maybe even on the host itself. The dmesg logs indicate the creation and deletion.

[ 9667.861101] docker0: port 1(veth7f1faef) entered blocking state
[ 9667.861103] docker0: port 1(veth7f1faef) entered disabled state
[ 9667.861156] device veth7f1faef entered promiscuous mode
[ 9667.861256] IPv6: ADDRCONF(NETDEV_UP): veth7f1faef: link is not ready
[ 9667.861257] docker0: port 1(veth7f1faef) entered blocking state
[ 9667.861259] docker0: port 1(veth7f1faef) entered forwarding state
[ 9667.861813] docker0: port 1(veth7f1faef) entered disabled state
[ 9667.914435] eth0: renamed from vethcaa37ad
[ 9667.925591] IPv6: ADDRCONF(NETDEV_CHANGE): veth7f1faef: link becomes ready
[ 9667.925674] docker0: port 1(veth7f1faef) entered blocking state
[ 9667.925675] docker0: port 1(veth7f1faef) entered forwarding state
[ 9667.958828] vethcaa37ad: renamed from eth0
[ 9667.970392] docker0: port 1(veth7f1faef) entered disabled state
[ 9667.983367] docker0: port 1(veth7f1faef) entered disabled state
[ 9667.985221] device veth7f1faef left promiscuous mode
[ 9667.985224] docker0: port 1(veth7f1faef) entered disabled state
mcastelino commented 6 years ago

Looks like we have a bug in our network namespace scanning logic. If there are interfaces we do not expect to see like GRE which appear when GRE tunnels are created on the host, we do not ignore them and move on. We need to change our scanning logic to ignore the interfaces such as gre0 and gretap0 which are side effects of tunnel creation on the host.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
    link/gre 0.0.0.0 brd 0.0.0.0
3: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
grahamwhaley commented 6 years ago

/cc @jcvenegas for the 'qemu left running bit', as iirc he was looking at that area recently??

mcastelino commented 6 years ago

Fixed by https://github.com/containers/virtcontainers/pull/394 Needs re-vendoring