Closed sw37th closed 6 years ago
The first part may be a kernel bug, effectively /sys being racy between bump of the number of VFs and them being registered in /sys.
The second part does sound like a LXD bug, we'll need to track down exactly what's going on there.
LXD will set the MAC address on the container's interface, that part is expected, but this should work and I'm unsure why it's showing eth0 there instead of eth1.
The first part may be a kernel bug, effectively /sys being racy between bump of the number of VFs and them being registered in /sys.
Ok, I ignore that error.
The second part does sound like a LXD bug, we'll need to track down exactly what's going on there.
LXD will set the MAC address on the container's interface, that part is expected, but this should work and I'm unsure why it's showing eth0 there instead of eth1.
I have found the couse of this issue. It seems that it need to UP the parent (physical) device before creating its virtual functions.
On my LXD host machine, 10GbE devices are not configured. So it was in a DOWN state usually.
% ip a s dev enp4s0f0 4: enp4s0f0:mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff
Maybe virtual funcitons cannot UP while parent device is in a DOWN state. And, meybe, virtual function's MAC address cannot set while it is in a DOWN state.
% sudo sh -c 'echo 1 > /sys/class/net/enp4s0f0/device/sriov_numvfs' % ip a (snip) 4: enp4s0f0:mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff 5: enp4s0f1: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:1b:21:bc:04:a3 brd ff:ff:ff:ff:ff:ff 7: eth0: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether d2:27:13:3d:b1:71 brd ff:ff:ff:ff:ff:ff % sudo ip link set dev eth0 up RTNETLINK answers: Network is down % sudo ip link set dev eth0 address 00:16:3e:ab:39:77 RTNETLINK answers: Operation not permitted (parent up) % sudo ip link set dev enp4s0f0 up (virtual function up) % sudo ip link set dev eth0 up (set MAC address) % sudo ip link set dev eth0 address 00:16:3e:ab:39:77 % ip a s dev eth0 8: eth0: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:16:3e:ab:39:77 brd ff:ff:ff:ff:ff:ff inet6 fe80::3c10:54ff:febc:a2cb/64 scope link valid_lft forever preferred_lft forever
If the parent device ups before create virtual functions, the virtual function can add to container successfully.
% sudo ip link set dev enp4s0f0 up % ip a s dev enp4s0f0 3: enp4s0f0:mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff inet6 fe80::21b:21ff:febc:4a2/64 scope link valid_lft forever preferred_lft forever % lxc launch ubuntu:18.04 mycontainer Creating mycontainer Starting mycontainer % lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Error: open /sys/class/net/enp4s0f0/device/virtfn1/net: no such file or directory (Ignore this error) % lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Device eth1 added to mycontainer % lxc exec mycontainer ip a s dev eth1 9: eth1: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:16:3e:b3:9e:ea brd ff:ff:ff:ff:ff:ff (Virtual functions are created according to Predictable Network Interface Names) % ip a (snip) 9: enp4s16: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:16:3e:b3:9e:ea brd ff:ff:ff:ff:ff:ff 10: enp4s16f2: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 86:92:1a:8a:a6:f1 brd ff:ff:ff:ff:ff:ff (snip) 70: enp4s31f2: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether be:dc:f7:be:66:df brd ff:ff:ff:ff:ff:ff 71: enp4s31f4: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:cd:84:93:ad:d4 brd ff:ff:ff:ff:ff:ff
If virtual functions created while the parent device is DOWN, They need some additional configuration to add to container.
% ip a s dev enp4s0f0 3: enp4s0f0:mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff % lxc launch ubuntu:18.04 mycontainer Creating mycontainer Starting mycontainer % lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Error: open /sys/class/net/enp4s0f0/device/virtfn1/net: no such file or directory (Ignore this error) % lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Error: Failed to set the MAC address: Failed to run: ip link set dev eth0 address 00:16:3e:21:c7:fe: RTNETLINK answers: Operation not permitted (Virtual functions are created by traditional interface naming scheme 'ethX') % ip a (snip) 9: eth0: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 5e:60:67:b9:d0:fc brd ff:ff:ff:ff:ff:ff 10: eth1: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether b2:72:30:bc:9d:2c brd ff:ff:ff:ff:ff:ff (snip) 70: eth61: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 2e:9e:d0:c3:65:f2 brd ff:ff:ff:ff:ff:ff 71: eth62: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 66:46:96:0a:78:3c brd ff:ff:ff:ff:ff:ff (UP the parent device) % sudo ip link set dev enp4s0f0 up % ip a s dev enp4s0f0 3: enp4s0f0: mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff inet6 fe80::21b:21ff:febc:4a2/64 scope link valid_lft forever preferred_lft forever (UP the first free virtual function explicitly) % sudo ip link set dev eth0 up % lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Device eth1 added to mycontainer % lxc exec mycontainer ip a s dev eth1 9: eth1: mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 00:16:3e:21:c7:fe brd ff:ff:ff:ff:ff:ff
Oh, that's interesting and not something I remember seeing on our test hardware, but we were testing on Mellanox whereas you seem to be on Intel.
I'll do another check on our test machine, it'd be nice if I could reproduce the behavior so I can then push a fix. By the sound of it, all it'd take is having us force the parent to be up.
So I've confirmed that Mellanox does not have this behavior. Since my only Intel SRIOV hardware is a machine where I do use the parent NIC, I can't actually test the fix.
I've tried the latest (edge) LXD by snap. I've been able to add a virtual function to container without UP the parent device explicitly. Now LXD can handle SR-IOV device of Intel X520-DA2 well. Thanks!
Required information
Distribution: Ubuntu (server)
Distribution version: Ubuntu 18.04.1 LTS server
The output of "lxc info" or if that fails:
SR-IOV enabled device: Intel X520-DA2 (82599ES)
Issue description
I tried to add SR-IOV device to running container by following command:
At the first time, it returns following error:
But the directory /sys/class/net/enp4s0f0/device/virtfn1/net has created by LXD at this time.
And after the second time, it returns:
It seems that LXD tries to rename host machine's SR-IOV virtual function device name
eth0
to container's device nameeth1
. But I'm not sure why LXD also tries to set the MAC address to virtual function.Steps to reproduce
% sudo ethtool enp4s0f0 Settings for enp4s0f0: Supported ports: [ FIBRE ] Supported link modes: 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: 10000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Auto-negotiation: off Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: no
% lspci | grep 82599ES 04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
% lxc config show mycontainer --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20180911) image.label: release image.os: ubuntu image.release: bionic image.serial: "20180911" image.version: "18.04" volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a volatile.eth0.hwaddr: 00:16:3e:3d:a4:a8 volatile.idmap.base: "0" volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: RUNNING devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:
default stateful: false description: ""
nictype=sriov
device as "eth1" to running container.(But the directory /sys/class/net/enp4s0f0/device/virtfn1/net has created by LXD at this time)
% ls -ld /sys/class/net/enp4s0f0/device/virtfn1/net drwxr-xr-x 3 root root 0 Sep 17 18:25 /sys/class/net/enp4s0f0/device/virtfn1/net % ls -l /sys/class/net/enp4s0f0/device/virtfn1/net total 0 drwxr-xr-x 5 root root 0 Sep 17 18:25 eth1
(Virtual functions are created)
% ip a (snip) 9: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 92:1a:2a:af:e1:73 brd ff:ff:ff:ff:ff:ff 10: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ee:b9:b0:57:a3:0e brd ff:ff:ff:ff:ff:ff (snip) 70: eth61: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ce:3b:f2:da:9c:cc brd ff:ff:ff:ff:ff:ff 71: eth62: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:4e:33:fa:19:2b brd ff:ff:ff:ff:ff:ff
(Now mycontainer has properties volatile.eth1.hwaddr and volatile.eth1.name)
% lxc config show mycontainer --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20180911) image.label: release image.os: ubuntu image.release: bionic image.serial: "20180911" image.version: "18.04" volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a volatile.eth0.hwaddr: 00:16:3e:3d:a4:a8 volatile.eth1.hwaddr: 00:16:3e:fd:07:49 volatile.eth1.name: eth1 volatile.idmap.base: "0" volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: RUNNING devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:
default stateful: false description: ""
Add
nictype=sriov
device to container again.Stop container, and add sriov device, and start. It seems that LXD tries to rename network device "veth9VDXG7" to "eth0".
% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Device eth1 added to mycontainer
% lxc config show mycontainer --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20180911) image.label: release image.os: ubuntu image.release: bionic image.serial: "20180911" image.version: "18.04" volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a volatile.eth0.hwaddr: 00:16:3e:3d:a4:a8 volatile.eth1.hwaddr: 00:16:3e:fd:07:49 volatile.eth1.name: eth1 volatile.idmap.base: "0" volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: STOPPED devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic eth1: nictype: sriov parent: enp4s0f0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:
% lxc start mycontainer Error: Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf: Try
lxc info --show-log mycontainer
for more info% lxc info --show-log mycontainer Name: mycontainer Remote: unix:// Architecture: x86_64 Created: 2018/09/17 09:21 UTC Status: Stopped Type: persistent Profiles: default
Log:
lxc mycontainer 20180917093504.360 ERROR lxc_network - network.c:lxc_setup_netdev_in_child_namespaces:2875 - Failed to rename network device "veth9VDXG7" to "eth0": File exists lxc mycontainer 20180917093504.360 ERROR lxc_network - network.c:lxc_setup_network_in_child_namespaces:3034 - failed to setup netdev lxc mycontainer 20180917093504.360 ERROR lxc_conf - conf.c:lxc_setup:3389 - Failed to setup network lxc mycontainer 20180917093504.360 ERROR lxc_start - start.c:do_start:1219 - Failed to setup container "mycontainer" lxc mycontainer 20180917093504.360 ERROR lxc_sync - sync.c:sync_wait:57 - An error occurred in another process (expected sequence number 5) lxc mycontainer 20180917093504.428 WARN lxc_network - network.c:lxc_delete_network_priv:2556 - Failed to rename interface with index 9 from "eth1" to its initial name "eth0" lxc mycontainer 20180917093504.428 ERROR lxc_start - start.c:lxc_start:1887 - Failed to spawn container "mycontainer" lxc mycontainer 20180917093504.428 ERROR lxc_container - lxccontainer.c:wait_on_daemonized_start:834 - Received container state "ABORTING" instead of "RUNNING" lxc 20180917093504.438 WARN lxc_commands - commands.c:lxc_cmd_rsp_recv:130 - Connection reset by peer - Failed to receive response for command "get_state"
default
to dummy profilenonic
which it has no network device. It seems that failed to setup hw address for network device "eth1".Information to attach
[ ] Any relevant kernel output (
dmesg
)[ ] Container log (
lxc info NAME --show-log
) Pelase see 'Steps to reproduce'[ ] Container configuration (
lxc config show NAME --expanded
) Pelase see 'Steps to reproduce'[ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
[ ] Output of the daemon with --debug (alternatively output of
lxc monitor
while reproducing the issue)metadata: context: ip: '@' method: GET url: /1.0 level: dbug message: handling timestamp: "2018-09-17T19:01:43.33375673+09:00" type: logging
metadata: context: ip: '@' method: GET url: /1.0/containers/mycontainer level: dbug message: handling timestamp: "2018-09-17T19:01:43.336362077+09:00" type: logging
metadata: context: {} level: dbug message: 'New event listener: fd74f805-8347-4d8e-8be7-f599dc9a0add' timestamp: "2018-09-17T19:01:43.34282792+09:00" type: logging
metadata: context: ip: '@' method: GET url: /1.0/events level: dbug message: handling timestamp: "2018-09-17T19:01:43.342735668+09:00" type: logging
metadata: context: ip: '@' method: PUT url: /1.0/containers/mycontainer/state level: dbug message: handling timestamp: "2018-09-17T19:01:43.344078236+09:00" type: logging
metadata: context: {} level: dbug message: 'New task operation: 174d2384-463e-42c5-94e6-5040e1ec7d80' timestamp: "2018-09-17T19:01:43.398930855+09:00" type: logging
metadata: class: task created_at: "2018-09-17T19:01:43.350178884+09:00" description: Starting container err: "" id: 174d2384-463e-42c5-94e6-5040e1ec7d80 may_cancel: false metadata: null resources: containers:
metadata: context: {} level: dbug message: 'Started task operation: 174d2384-463e-42c5-94e6-5040e1ec7d80' timestamp: "2018-09-17T19:01:43.399000369+09:00" type: logging
metadata: class: task created_at: "2018-09-17T19:01:43.350178884+09:00" description: Starting container err: "" id: 174d2384-463e-42c5-94e6-5040e1ec7d80 may_cancel: false metadata: null resources: containers:
metadata: context: ip: '@' method: GET url: /1.0/operations/174d2384-463e-42c5-94e6-5040e1ec7d80 level: dbug message: handling timestamp: "2018-09-17T19:01:43.400456804+09:00" type: logging
metadata: context: {} level: dbug message: Initializing a BTRFS driver. timestamp: "2018-09-17T19:01:43.412834248+09:00" type: logging
metadata: action: container-updated source: /1.0/containers/mycontainer timestamp: "2018-09-17T19:01:43.461718781+09:00" type: lifecycle
metadata: context: {} level: dbug message: Mounting BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.462473985+09:00" type: logging
metadata: context: {} level: dbug message: Mounting BTRFS storage pool "default". timestamp: "2018-09-17T19:01:43.462505122+09:00" type: logging
metadata: context: {} level: dbug message: Mounted BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.462771857+09:00" type: logging
metadata: context: {} level: dbug message: Mounting BTRFS storage pool "default". timestamp: "2018-09-17T19:01:43.507641996+09:00" type: logging
metadata: context: {} level: dbug message: Mounting BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.507611928+09:00" type: logging
metadata: context: {} level: dbug message: Mounted BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.507993046+09:00" type: logging
metadata: context: action: start created: 2018-09-17 18:21:15 +0900 JST ephemeral: "false" name: mycontainer stateful: "false" used: 2018-09-17 18:57:23 +0900 JST level: info message: Starting container timestamp: "2018-09-17T19:01:43.508099952+09:00" type: logging
metadata: context: ip: '@' method: GET url: /1.0 level: dbug message: handling timestamp: "2018-09-17T19:01:43.530738919+09:00" type: logging
metadata: context: ip: '@' method: GET url: /internal/containers/6/onstart level: dbug message: handling timestamp: "2018-09-17T19:01:43.533392397+09:00" type: logging
metadata: context: {} level: dbug message: Initializing a BTRFS driver. timestamp: "2018-09-17T19:01:43.546040756+09:00" type: logging
metadata: context: {} level: dbug message: Mounting BTRFS storage pool "default". timestamp: "2018-09-17T19:01:43.546118774+09:00" type: logging
metadata: context: {} level: dbug message: Mounting BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.546090483+09:00" type: logging
metadata: context: {} level: dbug message: Mounted BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.546434099+09:00" type: logging
metadata: context: {} level: dbug message: 'Scheduler: container mycontainer started: re-balancing' timestamp: "2018-09-17T19:01:43.547732663+09:00" type: logging
metadata: context: {} level: dbug message: 'Failure for task operation: 174d2384-463e-42c5-94e6-5040e1ec7d80: Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf: ' timestamp: "2018-09-17T19:01:43.727671396+09:00" type: logging
metadata: context: action: start created: 2018-09-17 18:21:15 +0900 JST ephemeral: "false" name: mycontainer stateful: "false" used: 2018-09-17 18:57:23 +0900 JST level: eror message: Failed starting container timestamp: "2018-09-17T19:01:43.727581966+09:00" type: logging
metadata: class: task created_at: "2018-09-17T19:01:43.350178884+09:00" description: Starting container err: 'Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf: ' id: 174d2384-463e-42c5-94e6-5040e1ec7d80 may_cancel: false metadata: null resources: containers:
metadata: context: ip: '@' method: GET url: /1.0 level: dbug message: handling timestamp: "2018-09-17T19:01:44.246959005+09:00" type: logging
metadata: context: {} level: dbug message: 'Disconnected event listener: fd74f805-8347-4d8e-8be7-f599dc9a0add' timestamp: "2018-09-17T19:01:44.247094353+09:00" type: logging
metadata: context: ip: '@' method: GET url: /internal/containers/6/onstop?target=stop level: dbug message: handling timestamp: "2018-09-17T19:01:44.249238762+09:00" type: logging
metadata: context: {} level: dbug message: Initializing a BTRFS driver. timestamp: "2018-09-17T19:01:44.260199645+09:00" type: logging
metadata: context: action: stop created: 2018-09-17 18:21:15 +0900 JST ephemeral: "false" name: mycontainer stateful: "false" used: 2018-09-17 19:01:43 +0900 JST level: info message: Container initiated stop timestamp: "2018-09-17T19:01:44.2603251+09:00" type: logging
metadata: context: {} level: dbug message: 'Scheduler: container mycontainer stopped: re-balancing' timestamp: "2018-09-17T19:01:44.2608741+09:00" type: logging
metadata: context: {} level: dbug message: 'Scheduler: network: dev9 has been added: updating network priorities' timestamp: "2018-09-17T19:01:44.348180378+09:00" type: logging
^C