canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.33k stars 926 forks source link

Cannot work nictype=sriov on Ubuntu 18.04.1 LTS #5041

Closed sw37th closed 6 years ago

sw37th commented 6 years ago

Required information

Issue description

I tried to add SR-IOV device to running container by following command:

% lxc config device add container eth1 nic nictype=sriov parent=enp4s0f0

At the first time, it returns following error:

% lxc config device add container eth1 nic nictype=sriov parent=enp4s0f0
Error: open /sys/class/net/enp4s0f0/device/virtfn1/net: no such file or directory

But the directory /sys/class/net/enp4s0f0/device/virtfn1/net has created by LXD at this time.

And after the second time, it returns:

% lxc config device add container eth1 nic nictype=sriov parent=enp4s0f0
Error: Failed to set the MAC address: Failed to run: ip link set dev eth0 address 00:16:3e:fd:07:49: RTNETLINK answers: Operation not permitted

It seems that LXD tries to rename host machine's SR-IOV virtual function device name eth0 to container's device name eth1. But I'm not sure why LXD also tries to set the MAC address to virtual function.

Steps to reproduce

  1. My SR-IOV device information:
    % ip a s dev enp4s0f0
    3: enp4s0f0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff

% sudo ethtool enp4s0f0 Settings for enp4s0f0: Supported ports: [ FIBRE ] Supported link modes: 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: 10000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Unknown! (255) Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Auto-negotiation: off Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: no

% lspci | grep 82599ES 04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

  1. Create a container
    % lxc launch ubuntu:18.04 mycontainer
    Creating mycontainer
    Starting mycontainer

% lxc config show mycontainer --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20180911) image.label: release image.os: ubuntu image.release: bionic image.serial: "20180911" image.version: "18.04" volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a volatile.eth0.hwaddr: 00:16:3e:3d:a4:a8 volatile.idmap.base: "0" volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: RUNNING devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:

(But the directory /sys/class/net/enp4s0f0/device/virtfn1/net has created by LXD at this time)

% ls -ld /sys/class/net/enp4s0f0/device/virtfn1/net drwxr-xr-x 3 root root 0 Sep 17 18:25 /sys/class/net/enp4s0f0/device/virtfn1/net % ls -l /sys/class/net/enp4s0f0/device/virtfn1/net total 0 drwxr-xr-x 5 root root 0 Sep 17 18:25 eth1

(Virtual functions are created)

% ip a (snip) 9: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 92:1a:2a:af:e1:73 brd ff:ff:ff:ff:ff:ff 10: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ee:b9:b0:57:a3:0e brd ff:ff:ff:ff:ff:ff (snip) 70: eth61: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether ce:3b:f2:da:9c:cc brd ff:ff:ff:ff:ff:ff 71: eth62: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 16:4e:33:fa:19:2b brd ff:ff:ff:ff:ff:ff

(Now mycontainer has properties volatile.eth1.hwaddr and volatile.eth1.name)

% lxc config show mycontainer --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20180911) image.label: release image.os: ubuntu image.release: bionic image.serial: "20180911" image.version: "18.04" volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a volatile.eth0.hwaddr: 00:16:3e:3d:a4:a8 volatile.eth1.hwaddr: 00:16:3e:fd:07:49 volatile.eth1.name: eth1 volatile.idmap.base: "0" volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: RUNNING devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:

% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0 Device eth1 added to mycontainer

% lxc config show mycontainer --expanded architecture: x86_64 config: image.architecture: amd64 image.description: ubuntu 18.04 LTS amd64 (release) (20180911) image.label: release image.os: ubuntu image.release: bionic image.serial: "20180911" image.version: "18.04" volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a volatile.eth0.hwaddr: 00:16:3e:3d:a4:a8 volatile.eth1.hwaddr: 00:16:3e:fd:07:49 volatile.eth1.name: eth1 volatile.idmap.base: "0" volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]' volatile.last_state.power: STOPPED devices: eth0: name: eth0 nictype: bridged parent: lxdbr0 type: nic eth1: nictype: sriov parent: enp4s0f0 type: nic root: path: / pool: default type: disk ephemeral: false profiles:

% lxc start mycontainer Error: Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf: Try lxc info --show-log mycontainer for more info

% lxc info --show-log mycontainer Name: mycontainer Remote: unix:// Architecture: x86_64 Created: 2018/09/17 09:21 UTC Status: Stopped Type: persistent Profiles: default

Log:

lxc mycontainer 20180917093504.360 ERROR lxc_network - network.c:lxc_setup_netdev_in_child_namespaces:2875 - Failed to rename network device "veth9VDXG7" to "eth0": File exists lxc mycontainer 20180917093504.360 ERROR lxc_network - network.c:lxc_setup_network_in_child_namespaces:3034 - failed to setup netdev lxc mycontainer 20180917093504.360 ERROR lxc_conf - conf.c:lxc_setup:3389 - Failed to setup network lxc mycontainer 20180917093504.360 ERROR lxc_start - start.c:do_start:1219 - Failed to setup container "mycontainer" lxc mycontainer 20180917093504.360 ERROR lxc_sync - sync.c:sync_wait:57 - An error occurred in another process (expected sequence number 5) lxc mycontainer 20180917093504.428 WARN lxc_network - network.c:lxc_delete_network_priv:2556 - Failed to rename interface with index 9 from "eth1" to its initial name "eth0" lxc mycontainer 20180917093504.428 ERROR lxc_start - start.c:lxc_start:1887 - Failed to spawn container "mycontainer" lxc mycontainer 20180917093504.428 ERROR lxc_container - lxccontainer.c:wait_on_daemonized_start:834 - Received container state "ABORTING" instead of "RUNNING" lxc 20180917093504.438 WARN lxc_commands - commands.c:lxc_cmd_rsp_recv:130 - Connection reset by peer - Failed to receive response for command "get_state"

  1. Switch the profile from default to dummy profile nonic which it has no network device. It seems that failed to setup hw address for network device "eth1".
% lxc profile list
+---------+---------+
|  NAME   | USED BY |
+---------+---------+
| default | 1       |
+---------+---------+
| nonic   | 0       |
+---------+---------+

% lxc profile show nonic
config: {}
description: LXD profile without nic device
devices:
  root:
    path: /
    pool: default
    type: disk
name: nonic
used_by: []

% lxc config edit mycontainer
% lxc config show mycontainer --expanded
architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20180911)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20180911"
  image.version: "18.04"
  volatile.base_image: c395a7105278712478ec1dbfaab1865593fc11292f99afe01d5b94f1c34a9a3a
  volatile.eth1.hwaddr: 00:16:3e:fd:07:49
  volatile.eth1.name: eth1
  volatile.idmap.base: "0"
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.last_state.power: STOPPED
devices:
  eth1:
    nictype: sriov
    parent: enp4s0f0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- nonic
stateful: false
description: ""

% lxc start mycontainer
Error: Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf:
Try `lxc info --show-log mycontainer` for more info

% lxc info --show-log mycontainer
Name: mycontainer
Remote: unix://
Architecture: x86_64
Created: 2018/09/17 09:21 UTC
Status: Stopped
Type: persistent
Profiles: nonic

Log:

lxc mycontainer 20180917094353.196 ERROR    lxc_network - network.c:setup_hw_addr:2758 - Failed to perform ioctl: Operation not permitted
lxc mycontainer 20180917094353.196 ERROR    lxc_network - network.c:lxc_setup_netdev_in_child_namespaces:2899 - Failed to setup hw address for network device "eth1"
lxc mycontainer 20180917094353.196 ERROR    lxc_network - network.c:lxc_setup_network_in_child_namespaces:3034 - failed to setup netdev
lxc mycontainer 20180917094353.196 ERROR    lxc_conf - conf.c:lxc_setup:3389 - Failed to setup network
lxc mycontainer 20180917094353.196 ERROR    lxc_start - start.c:do_start:1219 - Failed to setup container "mycontainer"
lxc mycontainer 20180917094353.196 ERROR    lxc_sync - sync.c:__sync_wait:57 - An error occurred in another process (expected sequence number 5)
lxc mycontainer 20180917094353.196 WARN     lxc_network - network.c:lxc_delete_network_priv:2556 - Failed to rename interface with index 9 from "eth1" to its initial name "eth0"
lxc mycontainer 20180917094353.196 ERROR    lxc_container - lxccontainer.c:wait_on_daemonized_start:834 - Received container state "ABORTING" instead of "RUNNING"
lxc mycontainer 20180917094353.196 ERROR    lxc_start - start.c:__lxc_start:1887 - Failed to spawn container "mycontainer"
lxc 20180917094353.206 WARN     lxc_commands - commands.c:lxc_cmd_rsp_recv:130 - Connection reset by peer - Failed to receive response for command "get_state"

Information to attach

metadata: context: ip: '@' method: GET url: /1.0 level: dbug message: handling timestamp: "2018-09-17T19:01:43.33375673+09:00" type: logging

metadata: context: ip: '@' method: GET url: /1.0/containers/mycontainer level: dbug message: handling timestamp: "2018-09-17T19:01:43.336362077+09:00" type: logging

metadata: context: {} level: dbug message: 'New event listener: fd74f805-8347-4d8e-8be7-f599dc9a0add' timestamp: "2018-09-17T19:01:43.34282792+09:00" type: logging

metadata: context: ip: '@' method: GET url: /1.0/events level: dbug message: handling timestamp: "2018-09-17T19:01:43.342735668+09:00" type: logging

metadata: context: ip: '@' method: PUT url: /1.0/containers/mycontainer/state level: dbug message: handling timestamp: "2018-09-17T19:01:43.344078236+09:00" type: logging

metadata: context: {} level: dbug message: 'New task operation: 174d2384-463e-42c5-94e6-5040e1ec7d80' timestamp: "2018-09-17T19:01:43.398930855+09:00" type: logging

metadata: class: task created_at: "2018-09-17T19:01:43.350178884+09:00" description: Starting container err: "" id: 174d2384-463e-42c5-94e6-5040e1ec7d80 may_cancel: false metadata: null resources: containers:

metadata: context: {} level: dbug message: 'Started task operation: 174d2384-463e-42c5-94e6-5040e1ec7d80' timestamp: "2018-09-17T19:01:43.399000369+09:00" type: logging

metadata: class: task created_at: "2018-09-17T19:01:43.350178884+09:00" description: Starting container err: "" id: 174d2384-463e-42c5-94e6-5040e1ec7d80 may_cancel: false metadata: null resources: containers:

metadata: context: ip: '@' method: GET url: /1.0/operations/174d2384-463e-42c5-94e6-5040e1ec7d80 level: dbug message: handling timestamp: "2018-09-17T19:01:43.400456804+09:00" type: logging

metadata: context: {} level: dbug message: Initializing a BTRFS driver. timestamp: "2018-09-17T19:01:43.412834248+09:00" type: logging

metadata: action: container-updated source: /1.0/containers/mycontainer timestamp: "2018-09-17T19:01:43.461718781+09:00" type: lifecycle

metadata: context: {} level: dbug message: Mounting BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.462473985+09:00" type: logging

metadata: context: {} level: dbug message: Mounting BTRFS storage pool "default". timestamp: "2018-09-17T19:01:43.462505122+09:00" type: logging

metadata: context: {} level: dbug message: Mounted BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.462771857+09:00" type: logging

metadata: context: {} level: dbug message: Mounting BTRFS storage pool "default". timestamp: "2018-09-17T19:01:43.507641996+09:00" type: logging

metadata: context: {} level: dbug message: Mounting BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.507611928+09:00" type: logging

metadata: context: {} level: dbug message: Mounted BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.507993046+09:00" type: logging

metadata: context: action: start created: 2018-09-17 18:21:15 +0900 JST ephemeral: "false" name: mycontainer stateful: "false" used: 2018-09-17 18:57:23 +0900 JST level: info message: Starting container timestamp: "2018-09-17T19:01:43.508099952+09:00" type: logging

metadata: context: ip: '@' method: GET url: /1.0 level: dbug message: handling timestamp: "2018-09-17T19:01:43.530738919+09:00" type: logging

metadata: context: ip: '@' method: GET url: /internal/containers/6/onstart level: dbug message: handling timestamp: "2018-09-17T19:01:43.533392397+09:00" type: logging

metadata: context: {} level: dbug message: Initializing a BTRFS driver. timestamp: "2018-09-17T19:01:43.546040756+09:00" type: logging

metadata: context: {} level: dbug message: Mounting BTRFS storage pool "default". timestamp: "2018-09-17T19:01:43.546118774+09:00" type: logging

metadata: context: {} level: dbug message: Mounting BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.546090483+09:00" type: logging

metadata: context: {} level: dbug message: Mounted BTRFS storage volume for container "mycontainer" on storage pool "default". timestamp: "2018-09-17T19:01:43.546434099+09:00" type: logging

metadata: context: {} level: dbug message: 'Scheduler: container mycontainer started: re-balancing' timestamp: "2018-09-17T19:01:43.547732663+09:00" type: logging

metadata: context: {} level: dbug message: 'Failure for task operation: 174d2384-463e-42c5-94e6-5040e1ec7d80: Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf: ' timestamp: "2018-09-17T19:01:43.727671396+09:00" type: logging

metadata: context: action: start created: 2018-09-17 18:21:15 +0900 JST ephemeral: "false" name: mycontainer stateful: "false" used: 2018-09-17 18:57:23 +0900 JST level: eror message: Failed starting container timestamp: "2018-09-17T19:01:43.727581966+09:00" type: logging

metadata: class: task created_at: "2018-09-17T19:01:43.350178884+09:00" description: Starting container err: 'Failed to run: /usr/lib/lxd/lxd forkstart mycontainer /var/lib/lxd/containers /var/log/lxd/mycontainer/lxc.conf: ' id: 174d2384-463e-42c5-94e6-5040e1ec7d80 may_cancel: false metadata: null resources: containers:

metadata: context: ip: '@' method: GET url: /1.0 level: dbug message: handling timestamp: "2018-09-17T19:01:44.246959005+09:00" type: logging

metadata: context: {} level: dbug message: 'Disconnected event listener: fd74f805-8347-4d8e-8be7-f599dc9a0add' timestamp: "2018-09-17T19:01:44.247094353+09:00" type: logging

metadata: context: ip: '@' method: GET url: /internal/containers/6/onstop?target=stop level: dbug message: handling timestamp: "2018-09-17T19:01:44.249238762+09:00" type: logging

metadata: context: {} level: dbug message: Initializing a BTRFS driver. timestamp: "2018-09-17T19:01:44.260199645+09:00" type: logging

metadata: context: action: stop created: 2018-09-17 18:21:15 +0900 JST ephemeral: "false" name: mycontainer stateful: "false" used: 2018-09-17 19:01:43 +0900 JST level: info message: Container initiated stop timestamp: "2018-09-17T19:01:44.2603251+09:00" type: logging

metadata: context: {} level: dbug message: 'Scheduler: container mycontainer stopped: re-balancing' timestamp: "2018-09-17T19:01:44.2608741+09:00" type: logging

metadata: context: {} level: dbug message: 'Scheduler: network: dev9 has been added: updating network priorities' timestamp: "2018-09-17T19:01:44.348180378+09:00" type: logging

^C

stgraber commented 6 years ago

The first part may be a kernel bug, effectively /sys being racy between bump of the number of VFs and them being registered in /sys.

The second part does sound like a LXD bug, we'll need to track down exactly what's going on there.

LXD will set the MAC address on the container's interface, that part is expected, but this should work and I'm unsure why it's showing eth0 there instead of eth1.

sw37th commented 6 years ago

The first part may be a kernel bug, effectively /sys being racy between bump of the number of VFs and them being registered in /sys.

Ok, I ignore that error.

The second part does sound like a LXD bug, we'll need to track down exactly what's going on there.

LXD will set the MAC address on the container's interface, that part is expected, but this should work and I'm unsure why it's showing eth0 there instead of eth1.

I have found the couse of this issue. It seems that it need to UP the parent (physical) device before creating its virtual functions.

On my LXD host machine, 10GbE devices are not configured. So it was in a DOWN state usually.

% ip a s dev enp4s0f0
4: enp4s0f0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff

Maybe virtual funcitons cannot UP while parent device is in a DOWN state. And, meybe, virtual function's MAC address cannot set while it is in a DOWN state.

% sudo sh -c 'echo 1 > /sys/class/net/enp4s0f0/device/sriov_numvfs'

% ip a
(snip)
4: enp4s0f0:  mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff
5: enp4s0f1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:1b:21:bc:04:a3 brd ff:ff:ff:ff:ff:ff
7: eth0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether d2:27:13:3d:b1:71 brd ff:ff:ff:ff:ff:ff

% sudo ip link set dev eth0 up
RTNETLINK answers: Network is down

% sudo ip link set dev eth0 address 00:16:3e:ab:39:77
RTNETLINK answers: Operation not permitted

(parent up)
% sudo ip link set dev enp4s0f0 up

(virtual function up)
% sudo ip link set dev eth0 up

(set MAC address)
% sudo ip link set dev eth0 address 00:16:3e:ab:39:77

% ip a s dev eth0
8: eth0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:16:3e:ab:39:77 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3c10:54ff:febc:a2cb/64 scope link
       valid_lft forever preferred_lft forever

If the parent device ups before create virtual functions, the virtual function can add to container successfully.

% sudo ip link set dev enp4s0f0 up

% ip a s dev enp4s0f0
3: enp4s0f0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::21b:21ff:febc:4a2/64 scope link
       valid_lft forever preferred_lft forever

% lxc launch ubuntu:18.04 mycontainer
Creating mycontainer
Starting mycontainer

% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0
Error: open /sys/class/net/enp4s0f0/device/virtfn1/net: no such file or directory
(Ignore this error)

% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0
Device eth1 added to mycontainer

% lxc exec mycontainer ip a s dev eth1
9: eth1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:16:3e:b3:9e:ea brd ff:ff:ff:ff:ff:ff

(Virtual functions are created according to Predictable Network Interface Names)
% ip a
(snip)
9: enp4s16:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:16:3e:b3:9e:ea brd ff:ff:ff:ff:ff:ff
10: enp4s16f2:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:92:1a:8a:a6:f1 brd ff:ff:ff:ff:ff:ff
(snip)
70: enp4s31f2:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether be:dc:f7:be:66:df brd ff:ff:ff:ff:ff:ff
71: enp4s31f4:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:cd:84:93:ad:d4 brd ff:ff:ff:ff:ff:ff

If virtual functions created while the parent device is DOWN, They need some additional configuration to add to container.

% ip a s dev enp4s0f0
3: enp4s0f0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff

% lxc launch ubuntu:18.04 mycontainer
Creating mycontainer
Starting mycontainer

% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0
Error: open /sys/class/net/enp4s0f0/device/virtfn1/net: no such file or directory
(Ignore this error)

% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0
Error: Failed to set the MAC address: Failed to run: ip link set dev eth0 address 00:16:3e:21:c7:fe: RTNETLINK answers: Operation not permitted

(Virtual functions are created by traditional interface naming scheme 'ethX')
% ip a
(snip)
9: eth0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5e:60:67:b9:d0:fc brd ff:ff:ff:ff:ff:ff
10: eth1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b2:72:30:bc:9d:2c brd ff:ff:ff:ff:ff:ff
(snip)
70: eth61:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 2e:9e:d0:c3:65:f2 brd ff:ff:ff:ff:ff:ff
71: eth62:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 66:46:96:0a:78:3c brd ff:ff:ff:ff:ff:ff

(UP the parent device)
% sudo ip link set dev enp4s0f0 up

% ip a s dev enp4s0f0
3: enp4s0f0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:1b:21:bc:04:a2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::21b:21ff:febc:4a2/64 scope link
       valid_lft forever preferred_lft forever

(UP the first free virtual function explicitly)
% sudo ip link set dev eth0 up

% lxc config device add mycontainer eth1 nic nictype=sriov parent=enp4s0f0
Device eth1 added to mycontainer

% lxc exec mycontainer ip a s dev eth1
9: eth1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:16:3e:21:c7:fe brd ff:ff:ff:ff:ff:ff
stgraber commented 6 years ago

Oh, that's interesting and not something I remember seeing on our test hardware, but we were testing on Mellanox whereas you seem to be on Intel.

I'll do another check on our test machine, it'd be nice if I could reproduce the behavior so I can then push a fix. By the sound of it, all it'd take is having us force the parent to be up.

stgraber commented 6 years ago

So I've confirmed that Mellanox does not have this behavior. Since my only Intel SRIOV hardware is a machine where I do use the parent NIC, I can't actually test the fix.

sw37th commented 6 years ago

I've tried the latest (edge) LXD by snap. I've been able to add a virtual function to container without UP the parent device explicitly. Now LXD can handle SR-IOV device of Intel X520-DA2 well. Thanks!