coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

ixgbevf: upgrade to the latest stable version #488

Closed vaijab closed 7 years ago

vaijab commented 9 years ago

Currently, CoreOS 808.0.0, ships with Linux 4.2 and version of ixgbevf driver is 2.12.1-k.

I am not sure why mainline kernel does not want to upgrade to the latest version of ixgbevf driver to be fair, so this issue is not entirely CoreOS' fault.

The reason this is a problem is, on AWS certain EC2 instances have enhanced networking enabled by default, which makes use of ixgbevf driver. The recommended version of this driver is >= 2.14.2. I noticed some weird networking issues on ec2 while using m4.2xlarge instance with CoreOS 808.0.0. After a few hours instances will start dropping most of network traffic. Only ICMP packets would go through. SSH stops working as well. Rebooting an instances helps for some time, after logging in to the instance I didn't see anything weird in the logs.

First I thought that MTU size of 9001 has something to do with what I was experiencing, so I set MTU to 1500, but that didn't help.

Switching back to instance types which do not have enhanced networking enabled solves the problem.

jtblin commented 8 years ago

Any update on this one? We experienced the same issue with 2.12.1-k and we need enhanced networking.

azilbersteinSFDC commented 8 years ago

Is this issue prioritised? Thanks.

crawford commented 8 years ago

There is really not much we can do if upstream hasn't upgraded the driver. As of now, the tree still only contains 2.12.1-k.

daveey commented 8 years ago

Has anyone tried upgrading the driver on their instance, and building an AMI?

daveey commented 8 years ago

In case someone else needs this, here is what we did:

Building The Module

First we need to know the CoreOS release branch. The following instructions assume it's "stable", but you can change it to alpha/beta

We also need a CoreOS machine. set BUILD_SERVER to the ip.

export BUILD_SERVER=ip-your-build-server-ip.ec2.internal
ssh core@BUILD_SERVER

Now you're on the BUILD_SERVER

# Download the CoreOS build tools container

wget http://stable.release.core-os.net/amd64-usr/current/coreos_developer_container.bin.bz2
bunzip2 coreos_developer_container.bin.bz2

# Launch it (via some weird primitive docker)
sudo systemd-nspawn -i coreos_developer_container.bin --share-system --bind /home/core

Now you're inside the build container

# Grab the kernel source
emerge-gitclone
emerge -gKav coreos-sources
cd /usr/src/linux
zcat /proc/config.gz >.config

# Build the kernel
make modules_prepare

# Download and build the ixgbevf source code
cd /tmp
curl -L http://downloads.sourceforge.net/project/e1000/ixgbevf%20stable/3.1.2/ixgbevf-3.1.2.tar.gz > ixgbevf-3.1.2.tar.gz
tar -zxvf ixgbevf-3.1.2.tar.gz
cd ixgbevf-3.1.2/src/
make

# Move it to a directory accessible outside the container
cp ixgbevf.ko /home/core

Now exit from both the container and build-server

# Figure out the kernel version of the release
export KERNEL_VERSION=$(curl -s  http://stable.release.core-os.net/amd64-usr/current/coreos_developer_container_contents.txt | grep ./usr/lib64/modules/ | head -n1 | cut -d/ -f 5)

# Copy the module from the build-server to bastion:/tmp
scp core@$BUILD_SERVER:/home/core/ixgbevf.ko /tmp/ixgbevf.ko

# Upload the module to s3 (making it publicly readable)
aws s3 cp --acl public-read /tmp/ixgbevf.ko s3://ixgbevf/$KERNEL_VERSION/ixgbevf.ko

Installing The Module

Download the module

curl -f https://s3.amazonaws.com/ixgbevf/$(uname -r)/ixgbevf.ko > /home/core/ixgbevf.ko

Remove the old, install the new:

sudo rmmod ixgbevf; sudo insmod /home/core/ixgbevf.ko

We have this as a systemd unit that runs on startup

mmelnyk commented 8 years ago

Thanks @daveey ! @crawford We really need update to start using coreos with new aws instances, so has somebody asked gentoo-team linux kernel team to upgrade ixgbevf to latest stable (or highest) version?

mjg59 commented 8 years ago

@mmelnyk The normal process for this is that the driver maintainer (Intel in this case) pushes updates to the mainstream kernel.

mmelnyk commented 8 years ago

Thanks @mjg59, got it.

mjg59 commented 8 years ago

On checking, there have been multiple commits to the in-kernel ixgbevf driver without any updates to the version number (in-kernel drivers rarely have version numbers). Comparing the in-kernel version number to the out of tree driver is therefore somewhat misleading. Unfortunately Intel don't provide a git tree or a revision history for their out of tree driver, so it's difficult to determine exactly what's going on here. The driver included in the latest CoreOS stable should be sufficiently new to work without problems, so if you're still seeing issues with 1010.5 then we'll have to investigate exactly what the problem is - it's unfortunately difficult for us to just drop the out of tree driver in unmodified.

joshi4 commented 8 years ago

@mjg59 any further updates to this one ? or using the latest stable version should be enough ?

crawford commented 7 years ago

@joshi4 using the latest stable version should be enough.