cncf / cnf-testbed

ARCHIVED: 🧪🛏️Cloud-native Network Function (CNF) Testbed --> See LFN Cloud Native Telecom Initiative https://wiki.lfnetworking.org/pages/viewpage.action?pageId=113213592
https://wiki.lfnetworking.org/pages/viewpage.action?pageId=113213592
Apache License 2.0
163 stars 51 forks source link

VPP with dpdk plugin in unprivileged container #291

Open taylor opened 5 years ago

taylor commented 5 years ago

When running VPP inside a container, some issues have been seen when trying to use NIC ports/interfaces (PFs/VFs) through the dpdk plugin.

Running the container as privileged (securityContext -> privileged: true) works as expected, and can be sufficient - But still not ideal for various reasons.

Consider the following configuration file:

apiVersion: v1
kind: Pod
metadata:
  name: sriov-simple-vpp-pod
  labels:
    env: test
spec:
  tolerations:
  - operator: "Exists"
  hostNetwork: true
  containers:
  - name: sriovdpdk
    image: soelvkaer/vppcontainer:latest
    imagePullPolicy: IfNotPresent
    securityContext:
      privileged: false
    command: [ "/bin/bash", "/opt/vpp/run_vpp.sh" ]
    resources:
      requests:
        hugepages-2Mi: 100Mi
        memory: 200Mi
      limits:
        hugepages-2Mi: 100Mi
        memory: 200Mi
    volumeMounts:
    - name: hugepage
      mountPath: /dev/hugepages/
    - name: vppconf
      mountPath: /opt/vpp/
  volumes:
    - name: hugepage
      emptyDir:
        medium: HugePages
    - name: vppconf
      configMap:
        name: vppconfig
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: vppconfig
data:
  startup.conf: |
    unix {
      nodaemon
      log /var/log/vpp/vpp.log
      full-coredump
      cli-listen /run/vpp/cli.sock
      gid vpp
      startup-config /etc/vpp/setup.gate ## No configuration provided
    }
    api-trace {
      on
    }
    api-segment {
      gid vpp
    }
    socksvr {
      default
    }
    cpu {
    }
    dpdk {
      dev 0000:1a:00.1 ## Port handled by vfio-pci driver in host
      no-multi-seg
      no-tx-checksum-offload
    }
    plugins {
      plugin default { disable }
      plugin dpdk_plugin.so { enable }
      plugin memif_plugin.so { enable }
    }
  run_vpp.sh: |
    vpp -c /opt/vpp/startup.conf 2>&1 | tee /etc/vpp/output.log

Running the POD results in the following error from VPP:

vlib_plugin_early_init:361: plugin path /usr/lib/x86_64-linux-gnu/vpp_plugins:/usr/lib/vpp_plugins
(( Omitting list of disabled plugins ))
load_one_plugin:189: Loaded plugin: dpdk_plugin.so (Data Plane Development Kit (DPDK))
(( Omitting list of disabled plugins ))
load_one_plugin:189: Loaded plugin: memif_plugin.so (Packet Memory Interface (memif) -- Experimental)
(( Omitting list of disabled plugins ))
unix_config:463: couldn't open log '/var/log/vpp/vpp.log'
vpp[27]: clib_elf_parse_file: open `linux-vdso.so.1': No such file or directory
vpp[27]: buffer: vlib_physmem_shared_map_create: pmalloc_map_pages: Unable to fulfill huge page allocation request: No such file or directory

vpp[27]: buffer: falling back to non-hugepage backed buffer pool
vpp[27]: buffer: vlib_physmem_shared_map_create: pmalloc_map_pages: Unable to fulfill huge page allocation request: No such file or directory

vpp[27]: buffer: falling back to non-hugepage backed buffer pool
vpp[27]: vlib_sort_init_exit_functions:161: order constraint fcn 'dns_init' not found
vpp[27]: vnet_feature_init:143: WARNING: arp arc: last node is error-drop, but expected arp-disabled!
vpp[27]: vnet_feature_arc_init:250: feature node 'acl-plugin-out-ip6-fa' not found (before 'ip6-dvr-reinject', arc 'ip6-output')
vpp[27]: vnet_feature_arc_init:250: feature node 'nat44-in2out-output' not found (before 'ip4-dvr-reinject', arc 'ip4-output')
vpp[27]: vnet_feature_arc_init:250: feature node 'acl-plugin-out-ip4-fa' not found (before 'ip4-dvr-reinject', arc 'ip4-output')
vpp[27]: clib_socket_init: bind (fd 7, '/run/vpp/stats.sock'): No such file or directory
vpp[27]: vlib_pci_bind_to_uio: Skipping PCI device 0000:1a:00.1: missing kernel VFIO or UIO driver
vpp[27]: dpdk: EAL init args: -c 2 -n 4 --in-memory --file-prefix vpp --master-lcore 1
EAL: FATAL: rte_service_init() failed
error allocating rte services array

Several variations of the above configuration, with additional mounts and capabilities added, has been tested as well. So far these tests have all been unsuccessful, and the only solution that has worked it to run the POD as privileged.

At this point, any container using PFs/VFs will be run a privileged. An example of this can be seen in https://github.com/cncf/cnf-testbed/pull/288. While each POD is able to see and use all of the interfaces, using a CNI such as SRIOV Network Device Plugin it is possible to assign a subset of interfaces to each POD, and by using this when generating the VPP configuration the interfaces used by each POD can be limited to the desired amount. This solution works in a controlled environment, under the assumption that each POD will stick to its requested resources. It is however possible for a POD to use a modified VPP configuration which uses more or all resources on the host.

michaelspedersen commented 5 years ago

Updated ticket with additional information

mackonstan commented 5 years ago

We have now two jira tickets tracking unprivileged VPP in FD.io:

  1. VPP-1787 Running VPP as unprivileged process

  2. CSIT-1627 Running VPP in unprivileged containers

Pursuing both actively with LFN FD.io community. Will update here when any of above jira ticket states changes.

mackonstan commented 5 years ago

Note that collected error logs are from @pmikus tests in FD.io CSIT labs. They have been reviewed by FD.io VPP committer and validated as correct.

pmikus commented 4 years ago

To start with baseline tests there are two minimal settings required

  1. Huge pages

    • Manual way $ sudo umount /dev/hugepages $ sudo mount -t hugetlbfs hugetlbfs /dev/hugepages -o uid=testuser -o gid=testuser $ echo 1000 | sudo tee /proc/sys/vm/hugetlb_shm_group

    • Automated way (ideal for persistence and bootstrapping - preffered) $ echo "hugetlbfs /dev/hugepages hugetlbfs mode=1777,uid=$(id -u),gid=$(id -g) 0 0" | sudo tee -a /etc/fstab $ echo "vm.hugetlb_shm_group=$(id -g)" | sudo tee -a /etc/sysctl.conf

  2. VFIO-PCI

    • Enough to only allow those interfaces that you want to use. I think i found also automated way via driver defaults, but this below should do the trick $ sudo chown $(id -nu):$(id -ng) /dev/vfio/*
  3. Containers This requires more understanding and depends on use-cases (Docker/LXC/K8S, base image used, etc...)

michaelspedersen commented 4 years ago

Been looking into the vfio part of this over the last couple of days.

For now I don't see any ways of running VPP containers without the privileged flag.

pmikus commented 4 years ago

I will take a look