coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

Support enhanced networking in EC2 AMI #273

Closed crawford closed 9 years ago

crawford commented 9 years ago

From @patrickbcullen on November 24, 2014 18:20

The CoreOS EC2 AMI does not currently support enhanced networking. See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html. Can you add support to this to the AMI build process? You would also need to upgrade the the ixgbevf kernel module to a newer version based on the EC2 documentatin.

Copied from original issue: coreos/manifest#52

crawford commented 9 years ago

From @adinapoli on February 16, 2015 10:48

:+1: This would be awesome to have; any progress on this front?

Thanks! Alfredo

kelindar commented 9 years ago

+1, this is very important for us too. Is it planned?

crawford commented 9 years ago

There are no concrete plans to support this at the moment. We still need to look into this.

JamieCressey commented 9 years ago

+1 on this too

cedric-vermeulen commented 9 years ago

+1 For this feature also, would be awesome !

skippy commented 9 years ago

+1

kelseyhightower commented 9 years ago

@crawford Any progress on this one? Any huge technical challenges to overcome?

epipho commented 9 years ago

+1

We have a variety of network heavy apps (analytics event ingestion, large databases, low latency games) that would benefit from this.

crawford commented 9 years ago

@kelseyhightower still no progress on this one. I'll have a chance to look at this next week.

guruvan commented 9 years ago

This would be awesome for our blockchain servers - they really could use it :+1:

jumanjiman commented 9 years ago

related issue: https://github.com/coreos/coreos-overlay/issues/1148

rynbrd commented 9 years ago

I'm looking for this support as well.

jumanjiman commented 9 years ago

just to be clear, the current driver in the coreos ami works. it's just not as recent as amazon recommends.

core@ip-192-168-17-58 ~ $ . /etc/os-release 
core@ip-192-168-17-58 ~ $ echo $VERSION
633.1.0
core@ip-192-168-17-58 ~ $ modinfo ixgbevf | grep -e version -e signer
version:        2.12.1-k
srcversion:     4E6C63FFF65E4F7CEFEFE9E
signer:         Magrathea: Glacier signing key

steps:

  1. create new HVM instance running coreos
  2. stop the instance
  3. enable sriov via awscli
  4. start the instance
  5. confirm that coreos uses the sriov driver. success!
  6. do your perf tests and observe significant improvement
  7. destroy the instance

This works for me:

# launch a new instance that has 10Gbps network

coreos="ami-6f134b5f"
size="r3.8xlarge"

#dryrun="--dry-run"

aws ec2 run-instances $dryrun \
  --count 1 \
  --image-id $coreos \
  --instance-type $size \
  --key-name redacted \
  --security-group-ids sg-redacted \
  --subnet-id subnet-redacted \
  --block-device-mappings '[{"DeviceName":"/dev/xvda","Ebs":{"VolumeSize":20,"DeleteOnTermination":true}},{"DeviceName":"/dev/xvdb","NoDevice":""}]'

# wait for it to launch, then stop the instance.
id="redacted"
aws ec2 stop-instances --instance-ids $id

# enable sriov
aws ec2 modify-instance-attribute --instance-id $id \
    --sriov-net-support simple

# start the instance
aws ec2 start-instances --instance-ids $id

Verify the instance uses sriov driver.

core@ip-192-168-17-58 ~ $ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether 0a:83:31:d2:87:2d brd ff:ff:ff:ff:ff:ff

core@ip-192-168-17-58 ~ $ ethtool -i ens3 | grep ixgbevf
driver: ixgbevf

destroy the instance

user@devenv:~$ aws ec2 terminate-instances --instance-ids $id

Do performance testing and see significant improvement.

rynbrd commented 9 years ago

I followed @jumanjiman's guide and it works for me.

marineam commented 9 years ago

To folks who have manually enabled ixgbevf, in the future new CoreOS instances will name the device eth0 instead of ens3 for consistency with instances using the default Xen device. https://github.com/coreos/coreos-overlay/pull/1320

marineam commented 9 years ago

This was fixed in https://github.com/coreos/scripts/pull/417 and released: https://coreos.com/releases/#735.0.0

My intent was to only enable it on 735 and later but looks like the flag got enabled for the current beta and stable AMIs as well. Hopefully that hasn't caused a problem for anyone :(

epipho commented 9 years ago

ixgbevf does appear to be enabled in 723.3 beta. However since coreos/coreos-overlay#1320 isn't in beta yet, creating a new instance using 723.3 beta results in the network device being named ens3 which could be a surprise to some.

laurrentt commented 8 years ago

@crawford I'm currently using CoreOS-stable-835.12.0-hvm on a c4.large with SriovNetSupport enabled. Still my eth0 doesn't seem to be using the proper driver:

core@ip-0-0-0-0 ~ $ ethtool -i eth0
driver: vif
...

Any idea what could be the cause ?

SergeyZh commented 8 years ago

I found that current version 2.12.1-k of ixgbevf in CoreOS loses or corrupts small amount of IP packets on Amazon AWS. According http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html For the best performance, we recommend that the ixgbevf module is version 2.14.2 or higher.

Could you upgrade this driver please ? It stops us using new instance types on AWS because new instances uses ixgbevf and it cause TCP timeouts sometime. I don't have these problems with vif adapter on old AWS instances.

Please note, even last stable version of CoreOS has old driver for ixgbevf:

4.5.0-coreos-r1 # ethtool -i eth0
driver: ixgbevf
version: 2.12.1-k
pwaller commented 8 years ago

See https://github.com/coreos/bugs/issues/488