Closed vaijab closed 7 years ago
Any update on this one? We experienced the same issue with 2.12.1-k and we need enhanced networking.
Is this issue prioritised? Thanks.
There is really not much we can do if upstream hasn't upgraded the driver. As of now, the tree still only contains 2.12.1-k.
Has anyone tried upgrading the driver on their instance, and building an AMI?
In case someone else needs this, here is what we did:
First we need to know the CoreOS release branch. The following instructions assume it's "stable", but you can change it to alpha/beta
We also need a CoreOS machine. set BUILD_SERVER
to the ip.
export BUILD_SERVER=ip-your-build-server-ip.ec2.internal
ssh core@BUILD_SERVER
Now you're on the BUILD_SERVER
# Download the CoreOS build tools container
wget http://stable.release.core-os.net/amd64-usr/current/coreos_developer_container.bin.bz2
bunzip2 coreos_developer_container.bin.bz2
# Launch it (via some weird primitive docker)
sudo systemd-nspawn -i coreos_developer_container.bin --share-system --bind /home/core
Now you're inside the build container
# Grab the kernel source
emerge-gitclone
emerge -gKav coreos-sources
cd /usr/src/linux
zcat /proc/config.gz >.config
# Build the kernel
make modules_prepare
# Download and build the ixgbevf source code
cd /tmp
curl -L http://downloads.sourceforge.net/project/e1000/ixgbevf%20stable/3.1.2/ixgbevf-3.1.2.tar.gz > ixgbevf-3.1.2.tar.gz
tar -zxvf ixgbevf-3.1.2.tar.gz
cd ixgbevf-3.1.2/src/
make
# Move it to a directory accessible outside the container
cp ixgbevf.ko /home/core
Now exit from both the container and build-server
# Figure out the kernel version of the release
export KERNEL_VERSION=$(curl -s http://stable.release.core-os.net/amd64-usr/current/coreos_developer_container_contents.txt | grep ./usr/lib64/modules/ | head -n1 | cut -d/ -f 5)
# Copy the module from the build-server to bastion:/tmp
scp core@$BUILD_SERVER:/home/core/ixgbevf.ko /tmp/ixgbevf.ko
# Upload the module to s3 (making it publicly readable)
aws s3 cp --acl public-read /tmp/ixgbevf.ko s3://ixgbevf/$KERNEL_VERSION/ixgbevf.ko
Download the module
curl -f https://s3.amazonaws.com/ixgbevf/$(uname -r)/ixgbevf.ko > /home/core/ixgbevf.ko
Remove the old, install the new:
sudo rmmod ixgbevf; sudo insmod /home/core/ixgbevf.ko
We have this as a systemd unit that runs on startup
Thanks @daveey !
@crawford We really need update to start using coreos with new aws instances, so has somebody asked gentoo-team linux kernel team to upgrade ixgbevf to latest stable (or highest) version?
@mmelnyk The normal process for this is that the driver maintainer (Intel in this case) pushes updates to the mainstream kernel.
Thanks @mjg59, got it.
On checking, there have been multiple commits to the in-kernel ixgbevf driver without any updates to the version number (in-kernel drivers rarely have version numbers). Comparing the in-kernel version number to the out of tree driver is therefore somewhat misleading. Unfortunately Intel don't provide a git tree or a revision history for their out of tree driver, so it's difficult to determine exactly what's going on here. The driver included in the latest CoreOS stable should be sufficiently new to work without problems, so if you're still seeing issues with 1010.5 then we'll have to investigate exactly what the problem is - it's unfortunately difficult for us to just drop the out of tree driver in unmodified.
@mjg59 any further updates to this one ? or using the latest stable version should be enough ?
@joshi4 using the latest stable version should be enough.
Currently, CoreOS 808.0.0, ships with Linux 4.2 and version of
ixgbevf
driver is 2.12.1-k.I am not sure why mainline kernel does not want to upgrade to the latest version of
ixgbevf
driver to be fair, so this issue is not entirely CoreOS' fault.The reason this is a problem is, on AWS certain EC2 instances have enhanced networking enabled by default, which makes use of
ixgbevf
driver. The recommended version of this driver is >= 2.14.2. I noticed some weird networking issues on ec2 while using m4.2xlarge instance with CoreOS 808.0.0. After a few hours instances will start dropping most of network traffic. Only ICMP packets would go through. SSH stops working as well. Rebooting an instances helps for some time, after logging in to the instance I didn't see anything weird in the logs.First I thought that MTU size of 9001 has something to do with what I was experiencing, so I set MTU to 1500, but that didn't help.
Switching back to instance types which do not have enhanced networking enabled solves the problem.