clearcontainers / runtime

OCI (Open Containers Initiative) compatible runtime using Virtual Machines
Apache License 2.0
590 stars 70 forks source link

Does k8s+docker+cc works normally now? #888

Open miaoyq opened 6 years ago

miaoyq commented 6 years ago

Description of problem

In my host(VM), docker+cc works well, I can create container noemally. But my host network is broken, when I create a pod.

Execute command like:

# kubectl create -f myapp.yaml

The content of myapp.yaml:

# cat myapp.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox
    command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']

after executing this cmd, my host network is broken.

Expected result

(replace this text with an explanation of what you thought would happen)

Actual result

(replace this text with details of what actually happened)


cc-collect-data.sh

Meta details

Running cc-collect-data.sh version 3.0.12 (commit d9f04c9) at 2018-01-03.21:33:31.355704102.


Runtime is /usr/bin/cc-runtime.

cc-env

Output of "/usr/bin/cc-runtime cc-env":

[Meta]
  Version = "1.0.6"

[Runtime]
  Debug = false
  [Runtime.Version]
    Semver = "3.0.12"
    Commit = "d9f04c9"
    OCI = "1.0.0-dev"
  [Runtime.Config]
    Path = "/usr/share/defaults/clear-containers/configuration.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.7.1(2.7.1+git.d4a337fe91-9.cc), Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  Debug = false

[Image]
  Path = "/usr/share/clear-containers/clear-19790-containers.img"

[Kernel]
  Path = "/usr/share/clear-containers/vmlinuz-4.9.60-82.container"
  Parameters = ""

[Proxy]
  Type = "ccProxy"
  Version = "Version: 3.0.12+git.3c6daa6"
  Path = "/usr/libexec/clear-containers/cc-proxy"
  Debug = false

[Shim]
  Type = "ccShim"
  Version = "shim version: 3.0.12 (commit: d01f9a7)"
  Path = "/usr/libexec/clear-containers/cc-shim"
  Debug = false

[Agent]
  Type = "hyperstart"
  Version = "<<unknown>>"

[Host]
  Kernel = "4.10.0-42-generic"
  CCCapable = false
  [Host.Distro]
    Name = "Ubuntu"
    Version = "16.04"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "QEMU Virtual CPU version 2.0.0"

Runtime config files

Runtime default config files

/usr/share/defaults/clear-containers/configuration.toml
/usr/share/defaults/clear-containers/configuration.toml

Runtime config file contents

Config file /etc/clear-containers/configuration.toml not found Output of "cat "/usr/share/defaults/clear-containers/configuration.toml"":

# XXX: Warning: this file is auto-generated from file "config/configuration.toml.in".

[hypervisor.qemu]
path = "/usr/bin/qemu-lite-system-x86_64"
kernel = "/usr/share/clear-containers/vmlinuz.container"
image = "/usr/share/clear-containers/clear-containers.img"
machine_type = "pc"
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
# trouble running pre-2.15 glibc
kernel_params = ""

# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = ""

# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators=""

# Default number of vCPUs per POD/VM:
# unspecified or 0 --> will be set to 1
# < 0              --> will be set to the actual number of physical cores
# > 0 <= 255       --> will be set to the specified number
# > 255            --> will be set to 255
default_vcpus = -1

# Bridges can be used to hot plug devices.
# Limitations:
# * Currently only pci bridges are supported
# * Until 30 devices per bridge can be hot plugged.
# * Until 5 PCI bridges can be cold plugged per VM.
#   This limitation could be a bug in qemu or in the kernel
# Default number of bridges per POD/VM:
# unspecified or 0   --> will be set to 1
# > 1 <= 5           --> will be set to the specified number
# > 5                --> will be set to 5
default_bridges = 1

# Default memory size in MiB for POD/VM.
# If unspecified then it will be set 2048 MiB.
#default_memory = 2048
disable_block_device_use = false

# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
# This is useful when you want to reserve all the memory
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true

# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically 
# result in memory pre allocation
#enable_hugepages = true

# Enable swap of vm memory. Default false.
# The behaviour is undefined if mem_prealloc is also set to true
#enable_swap = true

# Debug changes the default hypervisor and kernel parameters to
# enable debug output where available.
# Default false
# these logs can be obtained in the cc-proxy logs  when the 
# proxy is set to run in debug mode
# /usr/libexec/clear-containers/cc-proxy -log debug
# or by stopping the cc-proxy service and running the cc-proxy 
# explicitly using the same command line
# 
#enable_debug = true

# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
# 
#disable_nesting_checks = true

[proxy.cc]
path = "/usr/libexec/clear-containers/cc-proxy"

# If enabled, proxy messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[shim.cc]
path = "/usr/libexec/clear-containers/cc-shim"

# If enabled, shim messages will be sent to the system log
# (default: disabled)
#enable_debug = true

[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true

Logfiles

Runtime logs

No recent runtime problems found in system journal.

Proxy logs

No recent proxy problems found in system journal.

Shim logs

No recent shim problems found in system journal.


Container manager details

Have docker

Docker

Output of "docker info":

Containers: 14
 Running: 0
 Paused: 0
 Stopped: 14
Images: 181
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 272
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: yj55coyt00765yplt82ehvqxn
 Is Manager: true
 ClusterID: wbgrn39j44b8bdpalyeqneihi
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 172.120.0.216
 Manager Addresses:
  172.120.0.216:2377
Runtimes: cc-runtime runc
Default Runtime: cc-runtime
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.10.0-42-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.794GiB
Name: open-paas
ID: YRVZ:526T:HT2C:VFKB:B3X2:N3PM:AF72:75PS:G33V:ZNVR:EFT6:NBLK
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 34
 Goroutines: 139
 System Time: 2018-01-03T21:33:31.751524809+08:00
 EventsListeners: 0
Username: miaoyq
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Have kubectl

Kubernetes

Output of "kubectl config view":

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: REDACTED
    server: https://192.168.122.198:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

Packages

Have dpkg Output of "dpkg -l|egrep "(cc-proxy|cc-runtime|cc-shim|clear-containers-image|linux-container|qemu-lite|qemu-system-x86|cc-oci-runtime)"":

ii  cc-proxy                                   3.0.12+git.3c6daa6-17                         amd64        
ii  cc-runtime                                 3.0.12+git.d9f04c9-17                         amd64        
ii  cc-runtime-bin                             3.0.12+git.d9f04c9-17                         amd64        
ii  cc-runtime-config                          3.0.12+git.d9f04c9-17                         amd64        
ii  cc-shim                                    3.0.12+git.d01f9a7-17                         amd64        
ii  clear-containers-image                     19790-43                                      amd64        Clear containers image
ii  linux-container                            4.9.60-82                                     amd64        linux kernel optimised for container-like workloads.
ii  qemu-lite                                  2.7.1+git.d4a337fe91-9                        amd64        linux kernel optimised for container-like workloads.
ii  qemu-system-x86                            1:2.5+dfsg-5ubuntu10.16                       amd64        QEMU full system emulation binaries (x86)

No rpm


grahamwhaley commented 6 years ago

Hi @miaoyq - can you please give us at least the exact commands you executed? It might also help to speed things up if you ran the cc-collect-data.sh and attach the output as requested in the Issue report template.

thanks.

miaoyq commented 6 years ago

@grahamwhaley Thanks for your reply, have updated, expect to get your help. :-)

grahamwhaley commented 6 years ago

Thanks @miaoyq - can you explain what you did (which commands do you run) when you say 'when I create a pod' ? How do you create a pod, which docker commands do you run, can you tell how your network is broken? Just for reference, one way I know to 'break' the host network with CC is to use the --net=host option (as listed here: https://github.com/clearcontainers/runtime/blob/master/docs/limitations.md#docker---nethost).

/cc @amshinde @chavafg for any thoughts (especially if this might be swarm and docker version related etc.)

miaoyq commented 6 years ago

@grahamwhaley I create a container via docker run -ti -d busybox top, successful.

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:78:88:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.198/24 brd 192.168.122.255 scope global dynamic ens6
       valid_lft 3497sec preferred_lft 3497sec
    inet6 fe80::aacb:be64:e387:3820/64 scope link 
       valid_lft forever preferred_lft forever
3: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:5b:ff:df brd ff:ff:ff:ff:ff:ff
    inet 172.120.0.216/24 brd 172.120.0.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe5b:f6df/64 scope link 
       valid_lft forever preferred_lft forever
4: br-0a7129f645d4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:c4:9c:4c:00 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 scope global br-0a7129f645d4
       valid_lft forever preferred_lft forever
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:43:8e:39:d3 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
6: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:05:0b:d8:cb brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.1/16 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
    inet6 fe80::42:5ff:fe0b:d8cb/64 scope link 
       valid_lft forever preferred_lft forever
... ...

But I create a pod via kubectl create -f myapp.yaml, failed

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
4: br-0a7129f645d4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:c4:9c:4c:00 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 scope global br-0a7129f645d4
       valid_lft forever preferred_lft forever
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:43:8e:39:d3 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
6: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:05:0b:d8:cb brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.1/16 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
    inet6 fe80::42:5ff:fe0b:d8cb/64 scope link 
       valid_lft forever preferred_lft forever

Related log: syslog.txt

miaoyq commented 6 years ago

ping @amshinde @chavafg Could you help me and give some guidance advice for this scenario? tks :)

grahamwhaley commented 6 years ago

Hi @miaoyq - I have a feeling some folks are still on vacation until Monday maybe. There is a relevant Issue filed over on #890 though I believe - looks like an image update broke some of the k8s networking - we are working on a revert/fix/release at the moment... /cc @jcvenegas

sboeuf commented 6 years ago

@miaoyq I'd like some clarifications on your issue here. You're saying that creating a pod with k8s, using the command kubectl create -f myapp.yaml, messes the network on your host system ?? So basically you run ip addr on your host before to run k8s and if you run it after you've started the pod, you have a different network result ? Or is it what you see from the guest OS ?

Also, have you tried with runc ? Is it really tied to Clear Containers ?

miaoyq commented 6 years ago

@miaoyq I'd like some clarifications on your issue here. You're saying that creating a pod with k8s, using the command kubectl create -f myapp.yaml, messes the network on your host system ?? So basically you run ip addr on your host before to run k8s and if you run it after you've started the pod, you have a different network result ?

@sboeuf Yes.

Or is it what you see from the guest OS ?

Yes.

Also, have you tried with runc ? Is it really tied to Clear Containers ?

It's ok with runc, but once switch to cc-runtime, ens3 and ens6 will be removed when create the first pod.

jcvenegas commented 6 years ago

@miaoyq what is network configuration you are using for k8s?

@mcastelino @sameo @sboeuf to me looks like the host net interfaces are in the namespace where the container is running. I remember an old behivor for old cc-oci-runtime 2.x when using host networking would result in a similar issue.

miaoyq commented 6 years ago

@miaoyq what is network configuration you are using for k8s?

@jcvenegas Sorry, I am not very familiar with network,how can I find the network configuration?

miaoyq commented 6 years ago

@jcvenegas

# ifconfig
br-0a7129f645d4 Link encap:Ethernet  HWaddr 02:42:c4:9c:4c:00  
          inet addr:172.18.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

docker0   Link encap:Ethernet  HWaddr 02:42:43:8e:39:d3  
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:43ff:fe8e:39d3/64 Scope:Link
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:577 errors:0 dropped:0 overruns:0 frame:0
          TX packets:792 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:61389 (61.3 KB)  TX bytes:71021 (71.0 KB)

docker_gwbridge Link encap:Ethernet  HWaddr 02:42:05:0b:d8:cb  
          inet addr:172.19.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:5ff:fe0b:d8cb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:217 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:20546 (20.5 KB)

ens3      Link encap:Ethernet  HWaddr 52:54:00:5b:ff:df  
          inet addr:172.120.0.216  Bcast:172.120.0.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe5b:f6df/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:37590 errors:0 dropped:0 overruns:0 frame:0
          TX packets:40739 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2731653 (2.7 MB)  TX bytes:50139747 (50.1 MB)

ens6      Link encap:Ethernet  HWaddr 52:54:00:78:88:4e  
          inet addr:192.168.122.198  Bcast:192.168.122.255  Mask:255.255.255.0
          inet6 addr: fe80::aacb:be64:e387:3820/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:325752 errors:0 dropped:7791 overruns:0 frame:0
          TX packets:80380 errors:0 dropped:0 overruns:0 carrier:0
          collisions:401503 txqueuelen:1000 
          RX bytes:442248758 (442.2 MB)  TX bytes:6978533 (6.9 MB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:2187033 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2187033 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:957439371 (957.4 MB)  TX bytes:957439371 (957.4 MB)
sboeuf commented 6 years ago

@miaoyq where are you running your command ? From the shell of your container or from a shell on your host ?

miaoyq commented 6 years ago

@sboeuf On my host.

sboeuf commented 6 years ago

Ok then we have to investigate this. More questions:

miaoyq commented 6 years ago

ens3 and ens6 are used to connect other host in subnet, and connect to the Internet. I have to restart my system

miaoyq commented 6 years ago

This is the related log (syslog.txt) when I created a pod.

miaoyq commented 6 years ago

@grahamwhaley @jcvenegas @sboeuf Thank you very much for your help, I think I should go to sleep, continue tomorrow :-)

mcastelino commented 6 years ago

@miaoyq just to clarify I assume this is your setup

If you are doing this, it will not work.

The reason is that the the kubernetes network plugins run with --net=host. As kubernetes will use whatever is the default runtime to run the PODs, this will result in the use of cc-runtime as runtime used by kubernetes (via docker) to launch PODs.

As cc-runtime does not support --net=host it will result in loss of network connectivity.

So you have two options

jodh-intel commented 6 years ago

We have documented that limitation here:

miaoyq commented 6 years ago

@miaoyq just to clarify I assume this is your setup

Docker setup with the default runtime of docker set to cc-runtime Normal docker usage works properly You now use docker with kubernetes to setup a k8s environment on the same machine

@mcastelino It fits well with your analysis, Thanks so much. :) I must setup CRI-O currently If I want to use cc-runtime in k8s, right? BTW, will cc or Kate-containers support k8s with docker?

mcastelino commented 6 years ago

@miaoyq

I must setup CRI-O currently If I want to use cc-runtime in k8s, right?

Yes

BTW, will cc or Kate-containers support k8s with docker?

When we support cri-containerd that is a possibility.

miaoyq commented 6 years ago

We have documented that limitation here:

https://github.com/clearcontainers/runtime/blob/master/docs/limitations.md#docker---nethost

@jodh-intel Thanks, I think I should study the document seriously.

miaoyq commented 6 years ago

@mcastelino Thanks so much. :-)

miaoyq commented 6 years ago

The reason is that the the kubernetes network plugins run with --net=host. As kubernetes will use whatever is the default runtime to run the PODs, this will result in the use of cc-runtime as runtime used by kubernetes (via docker) to launch PODs.

As cc-runtime does not support --net=host it will result in loss of network connectivity.

Hi @mcastelino @jodh-intel, I still have a bit confused about above, that kubelet create pause container with --net=none by default as far as I know, and the network of app container isn't set by docker but by cni plugin. The problem is how the plugin run with --net=host?

guangxuli commented 6 years ago

@mcastelino @jodh-intel @sboeuf @jcvenegas That's also what i want to know, do we miss something?

sameo commented 6 years ago

@miaoyq

BTW, will cc or Kate-containers support k8s with docker?

cc or kata will not work with the docker CRI shim (dockershim) that's currently used by kubelet as its default CRI. But it will work with:

When using dockershim, you will not be able to start and run any privileged container that requires host networking namespace access (i.e. --net=host) and you will most likely end up creating one VM per container as opposed to one VM per pod for cri-containerd, CRI-O and Frakti. That also will create networking issues.

miaoyq commented 6 years ago

@sameo Thanks for your detailed explanation :-)