coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

kernel panic with vxlan (overlay networking) on vmware #1671

Open basvdlei opened 8 years ago

basvdlei commented 8 years ago

Issue Report

Bug

We are getting kernel panics when experimenting with Docker's overlay network on some of our CoreOS machines running on VMWare.

CoreOS Version

NAME=CoreOS
ID=coreos
VERSION=1185.3.0
VERSION_ID=1185.3.0
BUILD_ID=2016-11-01-0605
PRETTY_NAME="CoreOS 1185.3.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

VMWare ESXi

Expected Behavior

Running/starting/stopping docker containers using overlay network to not crash the system.

Actual Behavior

Kernel panic's like: -BUG: unable to handle kernel NULL pointer dereference at 0000000000000045 -BUG: unable to handle kernel paging request at ffffffff00000158

Full kernel panic trace::

Reproduction Steps

Reproduce using Docker

  1. Configured etcd as a keystore for Docker using a systemd drop-in.

/etc/systemd/system/docker.service.d/docker-ops.conf

Environment='DOCKER_OPTS=--cluster-store=etcd://xx.xx.xx.xx:2379/docker --cluster-advertise=xx.xx.xx.xx:2376 --cluster-store-opt kv.cacertfile=/etc/ssl/etcd/ca.pem --cluster-store-opt kv.certfile=/etc/ssl/etcd/cert.pem --cluster-store-opt kv.keyfile=/etc/ssl/etcd/key.pem'
  1. Create sample script that starts/stops containers in a overlay network:

crash-with-docker.sh:

#!/bin/bash
set -eux

NAME=${1-test}
INSTANCES=${2-10}
PAUSE=${3-10}

docker network create -d overlay ${NAME}

while true; do
    for i in $(seq 1 $INSTANCES); do
        docker run -d -P --net ${NAME} \
            --name "${NAME}-${i}" \
            nginx:alpine
    done

    sleep $PAUSE

    for i in $(seq 1 $INSTANCES); do
        docker stop "${NAME}-${i}" && docker rm "${NAME}-${i}"
    done

    sleep $PAUSE
done

docker network rm "${NAME}"
  1. Run the script with two instances at the same time (offsetting the create/remove seems to make it hit easier):
    docker pull nginx:alpine
    ./crash-with-docker.sh test1 &
    sleep 5
    ./crash-with-docker.sh test2 &

Reproduce with iproute2

But it can also be reproduces by just using the iproute2 tools where I try to reproduce some of the steps that Docker's libnetwork uses to setup a overlay network:

  1. Create a script that set's up/deletes bridges connected to a vxlan and veth in a netns:

crash-with-iproute2.sh:

#!/bin/bash

set -x 

START=${1-0}
END=${2-0}
INTERFACE=${3-ens192}

while true; do 
    for i in $(seq $START $END); do
        ns=test${i}
        vxlanif=vxlan${i}
        ip netns add $ns
        ip netns exec $ns ip link add dev br0 type bridge
        ip link add $vxlanif mtu 1450 type vxlan id $((256+$i)) dstport 4789 learning proxy l2miss l3miss dev $INTERFACE
        ip link set dev $vxlanif netns $ns
        ip netns exec $ns ip link set dev $vxlanif name vxlan1
        ip netns exec $ns brctl addif br0 vxlan1
        ip link add dev veth1 mtu 1450 type veth peer name vethb${i} mtu 1450
        ip link set dev veth1 netns $ns
        ip netns exec $ns ip link set dev veth1 name eth0
        ip netns exec $ns brctl addif br0 eth0

        ip netns exec $ns ip link set vxlan1 up
        ip netns exec $ns ip link set eth0 up
        ip netns exec $ns ip link set br0 up
    done

    sleep 5

    for i in $(seq $START $END); do
        ns=test${i}
        ip netns del $ns
    done
done
  1. Run two instances of this script (offsetting the create/remove by waiting a little betweens starts seems to make it hit easier):
systemctl stop docker.service docker.socket
./crash-with-iproute2.sh 0 49 ens192 & 
sleep 2
./crash-with-iproute2.sh 50 99 ens192 &

Other Information

I've not had much luck trying to reproduce this in Vagrant (kvm or virtualbox), which suggest the problem is VMware specific.

crawford commented 7 years ago

I was able to reproduce this failure with the given script on AWS with a t2.micro instance in us-west-2 (ami-6f1eb80f).

I tested this against both the 4.7.3 and 4.9.0 kernels and it failed in the same way. It appears as though the panic is happening on the deletion of the netns (sudo ip netns del ${ns}).

crawford commented 7 years ago

Adding in delays around the namespace deletion prevents the panic, so this looks like a race condition rather than resource exhaustion. I'm in the process of further whittling down the script to figure out a minimal repro case.

crawford commented 7 years ago

This appears to be the minimal case:

#!/bin/bash

set -x

START=${1-0}
END=${2-0}
INTERFACE=${3-ens192}

while true; do
        for i in $(seq $START $END); do
                ns=test${i}
                vxlanif=vxlan${i}
                sudo ip netns add $ns
                sudo ip link add $vxlanif mtu 1450 type vxlan id $((256+$i)) dstport 4789 learning proxy l2miss l3miss dev $INTERFACE
                sudo ip link set dev $vxlanif netns $ns
        done

        sleep 5

        for i in $(seq $START $END); do
                ns=test${i}
                sudo ip netns del $ns
        done
done

Run with:

./crash-with-iproute2.sh 0 49 eth0 &
sleep 0.1
./crash-with-iproute2.sh 50 99 eth0 &

I'm not sure at this point whether or not there is any significance to the vxlan device being a part of the namespace, or if it merely slows down the teardown enough to allow two deletions to run concurrently.

basvdlei commented 7 years ago

Nice, this script really hits the bug consistently/fast. It also rules out any bridge involvement. My initial suspicion that it was VMware only was also wrong, I've now confirmed panics on KVM and VMWare.

I kept wondering why we can hit this bug so reliable, when I can't seem find any other reported cases. So I tried reproducing the panic on Debian Jessie with kernel 3.18 and 4.7.0 (from backports) without any luck. But an older CoreOS version v1122.3.0 (also with kernel 4.7.0) crashes immediately

Then I got the idea to disable systemd-networkd and that seems to make the bug not hit any more!

sudo systemctl stop systemd-networkd.service
sudo systemctl mask systemd-networkd.service
crawford commented 7 years ago

Then I got the idea to disable systemd-networkd and that seems to make the bug not hit any more!

Very interesting.

1262.0.0 added the ability to tell networkd not to manage certain interfaces (on previous versions, networkd would attempt to manage every interface, even if there was no configuration). I attempted to reproduce this failure on that release with the following config, but was unable to do so (since networkd is no longer managing those interfaces).

[Match]
Name=vxlan*

[Link]
Unmanaged=yes

Could you give that release a shot and see if it works around the kernel bug for you as well?

basvdlei commented 7 years ago

Tested the minimal case script on a node on VMWare upgraded to 1262.0.0 with the above network unit installed and that didn't cause a panic any more. So I got back to the initial test case with Docker overlay (crash-with-docker.sh), that did still cause a panic.

Docker does rename the vxlan interfaces (to 'vx-') though, so I've changed the config to match on the driver name instead:

[Match]
Driver=vxlan

[Link]
Unmanaged=yes

Running the crash-with-docker.sh test without panics for about an hour now. I'll keep it running for a while, but I'm hopeful about this workaround.

CyrilPeponnet commented 7 years ago

Got the same issue with either stable or alpha by having a docker service failing in loop. So container get spawned/died back to back. I can hit the kp in few secs.

Note that I'm running vbox. I will try the mitigation above.

CyrilPeponnet commented 7 years ago

In my case the

[Match]
Driver=vxlan

[Link]
Unmanaged=yes

Didn't helped but disabling systemd-networkd and maks it I no longer have the issue.

Otherwise I have this https://github.com/coreos/bugs/files/600438/20161114-Panic-3.txt kernel panic. I will try to disable management of bridges as well to see if it helps.

CyrilPeponnet commented 7 years ago

Looks like I had to restart the service first... This looks good for now.