amzn / amzn-drivers

Official AWS drivers repository for Elastic Network Adapter (ENA) and Elastic Fabric Adapter (EFA)
455 stars 175 forks source link

~214 byte limit with eBPF (Cilium) on Linux Kernels 5.3/4* #131

Closed AugustinasS closed 4 years ago

AugustinasS commented 4 years ago

When using EC2 type instance with ENA support (let's say T3a.medium) inbound & outbound traffic get's dropped if above ~214 bytes when using eBPF functionality in Cilium Kubernetes CNI (specifically kube-proxy-replacement) & using Linux Kernel 5.3 or 5.4.

To reproduce the issue Amazon Linux 2 AMI 2.0.20200406.0 x86_64 HVM gp2) or Ubuntu 19.10/20.04

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubectl, conntrack

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
  && chmod +x minikube

sudo cp minikube /usr/local/bin && rm minikube
echo "bpffs                      /sys/fs/bpf             bpf     defaults 0 0" >> /etc/fstab
amazon-linux-extras install docker

minikube start --network-plugin=cni --driver=none

kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.7/install/kubernetes/quick-install.yaml

One can ping this VM from outside or ping from inside to internet ping -s 300 amazon.com

it should work, but after upgrading Linux Kernel to 5.4 sudo amazon-linux-extras install kernel-ng

After reboot when Cilium POD wakes up

[root@ip-172-31-44-20 ~]# ping -s 214 amazon.com
PING amazon.com (176.32.103.205) 214(242) bytes of data.
222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=1 ttl=227 time=85.6 ms
222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=2 ttl=227 time=85.6 ms
222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=3 ttl=227 time=85.6 ms
222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=4 ttl=227 time=85.6 ms
222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=5 ttl=227 time=85.6 ms
^C
--- amazon.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 85.620/85.632/85.646/0.185 ms
[root@ip-172-31-44-20 ~]# ping -s 215 amazon.com
PING amazon.com (176.32.98.166) 215(243) bytes of data.
^C
--- amazon.com ping statistics ---
46 packets transmitted, 0 received, 100% packet loss, time 46048ms

Powering off Instance and changing Instance family to T2 or M4 does not have the problem as these are ENA not supporting instances, switching back to T3 will have the issue

Kernel 4.19 worked fine, upgrade to 5.4 broke it

Disabing Cilium Kube-proxy-replacement=disabled also removes the problem

Hence ENA driver does not go well with specific eBPF magic on latest LTS kernels

AWSNB commented 4 years ago

A bug was fixed in ENA XDP implementation, and release to github , release note quotes “* Fix XDP PASS issue due to incorrect handling of offset in rx_info.”

https://github.com/amzn/amzn-drivers/commit/ccbb1fe2c2f2ab3fc6d7827b012ba8ec06f32c39

could you try ena driver 2.2.8 from github to confirm if it solves the problem ? we working on release the fix to kernel (its probably there already, but not backported to LTS)

From: Augustinas Stirbis notifications@github.com Reply-To: amzn/amzn-drivers reply@reply.github.com Date: Thursday, May 28, 2020 at 12:22 AM To: amzn/amzn-drivers amzn-drivers@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [amzn/amzn-drivers] ~210 byte limit with eBPF (Cilium) on Linux Kernels 5.3/4* (#131)

When using EC2 type instance with ENA support (let's say T3a.medium) inbound & outbound traffic get's dropped if above ~210 bytes when using eBPF functionality in Cilium Kubernetes CNI (specifically kube-proxy-replacement) & using Linux Kernel 5.3 or 5.4.

To reproduce the issue Amazon Linux 2 AMI 2.0.20200406.0 x86_64 HVM gp2) or Ubuntu 19.10/20.04

cat < /etc/yum.repos.d/kubernetes.repo

[kubernetes]

name=Kubernetes

baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64

enabled=1

gpgcheck=1

repo_gpgcheck=1

gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

EOF

yum install -y kubectl, conntrack

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \

&& chmod +x minikube

sudo cp minikube /usr/local/bin && rm minikube

echo "bpffs /sys/fs/bpf bpf defaults 0 0" >> /etc/fstab

amazon-linux-extras install docker

minikube start --network-plugin=cni --driver=none

kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.7/install/kubernetes/quick-install.yaml

One can ping this VM from outside or ping from inside to internet ping -s 300 amazon.com

it should work, but after upgrading Linux Kernel to 5.4 sudo amazon-linux-extras install kernel-ng

After reboot when Cilium POD wakes up

[root@ip-172-31-44-20 ~]# ping -s 214 amazon.com

PING amazon.com (176.32.103.205) 214(242) bytes of data.

222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=1 ttl=227 time=85.6 ms

222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=2 ttl=227 time=85.6 ms

222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=3 ttl=227 time=85.6 ms

222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=4 ttl=227 time=85.6 ms

222 bytes from 176.32.103.205 (176.32.103.205): icmp_seq=5 ttl=227 time=85.6 ms

^C

--- amazon.com ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4004ms

rtt min/avg/max/mdev = 85.620/85.632/85.646/0.185 ms

[root@ip-172-31-44-20 ~]# ping -s 215 amazon.com

PING amazon.com (176.32.98.166) 215(243) bytes of data.

^C

--- amazon.com ping statistics ---

46 packets transmitted, 0 received, 100% packet loss, time 46048ms

Powering off Instance and changing Instance family to T2 or M4 does not have the problem as these are ENA not supporting instances, switching back to T3 will have the issue

Kernel 4.19 worked fine, upgrade to 5.4 broke it

Disabing Cilium Kube-proxy-replacement=disabled also removes the problem

Hence ENA driver does not go well with specific eBPF magic on latest LTS kernels

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/amzn/amzn-drivers/issues/131, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFTRWCOJMMAC2GFPL7KYKMDRTYGFZANCNFSM4NM2GAIQ.

ShayAgros commented 4 years ago

Hi, In case the above doesn't solve the issue, you might try to apply the attached patch: 0001-Bug-Fix-copy-pkt-headers-to-skb-inline-data.patch.txt

This patch fixes a known issue in our driver. It can be applied using git am 0001-Bug-Fix-copy-pkt-headers-to-skb-inline-data.patch.txt on top on the current 'master' branch.

Please let us know if any of the solutions helped solving the issue.

AugustinasS commented 4 years ago

Unfortunately, both suggestion did not help.

I can't be certain, that applied @ShayAgros suggested Bug-Fix-copy, at least I got different binaries

# diff kmod-ena-2.2.9-1.amzn2.25.x86_64.rpm kmod-ena-2.2.9-1.amzn2.25.x86_64_patch.rpm
Binary files kmod-ena-2.2.9-1.amzn2.25.x86_64.rpm and kmod-ena-2.2.9-1.amzn2.25.x86_64_patch.rpm differ

Here is output from modinfo

modinfo ena

filename:       /lib/modules/5.4.38-17.76.amzn2.x86_64/extra/ena/ena.ko
version:        2.2.9g
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
srcversion:     CBF3F17443FAB1A53E24D56
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000051sv*sd*bc*sc*i*
depends:
retpoline:      Y
name:           ena
vermagic:       5.4.38-17.76.amzn2.x86_64 SMP mod_unload modversions
parm:           debug:Debug level (0=none,...,16=all) (int)
parm:           rx_queue_size:Rx queue size. The size should be a power of 2. Max value is 8K
 (int)
parm:           force_large_llq_header:Increases maximum supported header size in LLQ mode to 224 bytes, while reducing the maximum TX queue size by half.
 (int)
parm:           num_io_queues:Sets number of RX/TX queues to allocate to device. The maximum value depends on the device and number of online CPUs.
 (int)

[root@ip-172-31-44-20 x86_64]# ping -s 210 www.amazon.com -c 3

PING e15316.e22.akamaiedge.net (23.210.253.71) 210(238) bytes of data.
218 bytes from a23-210-253-71.deploy.static.akamaitechnologies.com (23.210.253.71): icmp_seq=1 ttl=50 time=0.694 ms
218 bytes from a23-210-253-71.deploy.static.akamaitechnologies.com (23.210.253.71): icmp_seq=2 ttl=50 time=0.756 ms
218 bytes from a23-210-253-71.deploy.static.akamaitechnologies.com (23.210.253.71): icmp_seq=3 ttl=50 time=0.737 ms

--- e15316.e22.akamaiedge.net ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2020ms
rtt min/avg/max/mdev = 0.694/0.729/0.756/0.025 ms

[root@ip-172-31-44-20 x86_64]# ping -s 220 www.amazon.com -c 3

PING e15316.e22.akamaiedge.net (23.210.253.71) 220(248) bytes of data.

--- e15316.e22.akamaiedge.net ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2026ms
ShayAgros commented 4 years ago

Hi @AugustinasS , can you please contact me via shayagr@amazon.com for further debugging ?

AugustinasS commented 4 years ago

Actually applying patch currently it solved the issue:

git clone https://github.com/amzn/amzn-drivers.git
cd amzn-drivers
git am 0001-Bug-Fix-copy-pkt-headers-to-skb-inline-data.patch.txt
cd kernel/linux/ena
make
sudo make -C /lib/modules/`uname -r`/build M=`pwd` modules
sudo modprobe -r ena && sudo modprobe ena
dmesg | grep "Elastic Network Adapter (ENA) v[0-9]\.[0-9]\.[0-9]\+g"

The last line of the output should be

ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.2.15g

@ShayAgros thanks