WorksOnArm / equinix-metal-arm64-cluster

Arm and Equinix Metal have partnered to make powerful Neoverse based Armv8 bare metal infrastructure including latest generation Ampere systems — available for open source software developers to build, test and optimize for Arm64 architecture.
http://www.worksonarm.com
77 stars 12 forks source link

VPP + ligato/vpp_agent testing on ARM - request for access to the Works on Arm test and CI infrastructure #56

Closed stanislav-chlebec closed 4 years ago

stanislav-chlebec commented 6 years ago

Name, email, company, job title

Stanislav Chlebec, stanislav.chlebec@pantheon.tech, PANTHEON technologies s.r.o., Software Engineer in Test

Project Title and description

ARM + ligato/vpp_agent testing on ARM We want to prepare VPP (FD.io project) and Ligato/vpp-agent for ARM platform.

Which members of the community would benefit from your work?

My goal is prepare ligato/vpp_agent which is a Golang implementation of a control/management plane for VPP based cloud-native Virtual Network Functions (VNFs) for operating on ARM platform. I expect that I encounter some issues with VPP while testing which will be reported to the maintainers of the project. I expect that the project ligato/vpp_agent will be tested on ARM platform and can offer interesting functionality to the community.

Is the code that you’re going to run 100% open source? If so, what is the URL or URLs where it is located?

Yes, it is. Both project are under Apache License 2.0

What infrastructure (computing resources and network access) do you need? (see: https://www.packet.net/bare-metal/)?

I expect that this https://www.packet.net/bare-metal/servers/tiny/ could be enough for my purpose Ubuntu 16.04 LTS ARM -- 4 Physical Cores @ 2.4 GHz 8 GB of DDR3 RAM 80 GB of SSD

It is one time project planned to last from 6 month to 1 year.

Please state your contributions to the open source community and any other relevant initiatives

Pantheon technologies is leading contributor to ODL which is next open source project. It could be interesting later to make work VPP+ODL on platform ARM

Brag a little bit about yourself, please! https://pantheon.tech/

vielmetti commented 6 years ago

Approved, invite sent. Please deploy one c1.large.arm for the Arm tests, and a t1.small.x86 for the control plane as needed.

vielmetti commented 6 years ago

Looks like https://github.com/ligato/vpp-agent is the link on Github. If you open issues that you find, please link them back here for tracking.

stanislav-chlebec commented 6 years ago

ok. Thanks.

vielmetti commented 6 years ago

The equipment has been deployed, closing this issue.

vielmetti commented 6 years ago

Expanded scope includes cooperation on enabling ligato/vpp-agent. and the contiv/vpp project (https://github.com/contiv/vpp)

vielmetti commented 6 years ago

A related issue is https://github.com/contiv/vpp/issues/786

vielmetti commented 6 years ago

Reopening this because we are trying to set up Layer 2 and DPDK and there are some open questions.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vielmetti commented 6 years ago

We are running into some issues and I'm trying to track down a public repository to deal with the technical bits of this, will update with that issue number.

stanislav-chlebec commented 6 years ago

Trying to achieve cooperation of FD.IO VPP with the NIC present in the ThunderX (c1.large.arm) Not successful: https://gist.github.com/stanislav-chlebec/a36c43c2eee8c16d3d297e67bac1d711

Conclusion: VPP was successfully built on the system. VPP is not able to work with the NIC + vfio_pci neither with NIC + uio_pci_generic

Would it be possible to get some extra ethernet card for the server, such as Intel 82599 series, which can support DPDK?

vielmetti commented 6 years ago

Thanks Stanislav, I will investigate further based on your detailed notes. I have a call scheduled for Friday to address this.

vielmetti commented 6 years ago

Also, can you try reprovisioning the server and load Ubuntu 16.04 LTS instead of Ubuntu 18.04 LTS? I'm trying to rule out an operating system regression as a cause.

stanislav-chlebec commented 6 years ago

Actually I have the other provisioned server with Ubuntu 16.04 but currently I am not able to build vpp there. Anyway I will try to repeat steps also there.

vielmetti commented 6 years ago

There's a lot of notes at https://wiki.fd.io/view/VPP/AArch64 and a regular call and IRC channel.

stanislav-chlebec commented 6 years ago

https://gist.github.com/stanislav-chlebec/8a04fdf77246f1568306769e9ccf2c02 This is log for Ubuntu 16.04

There is shown some issues with DPDK Please search for testpmd command... At first there was failure and after some experimenting it was suddenly all right ...

vielmetti commented 6 years ago

There is a JIRA at https://jira.fd.io/secure/Dashboard.jspa which may be the right place to raise issues, or at least to search for them.

vielmetti commented 6 years ago

Did you install DPDK from the Ubuntu instructions at

https://help.ubuntu.com/lts/serverguide/DPDK.html.en

or from some other location?

stanislav-chlebec commented 6 years ago

https://lists.fd.io/g/vpp-dev/topic/vpp_cavium_thunderx_arm64/23753373?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,23753373

stanislav-chlebec commented 6 years ago

To answer your question about DPDK installing - I took http://dpdk.org/git/dpdk and issued the command: make install T=arm64-thunderx-linuxapp-gcc or went through the command usertools/dpdk-setup.sh where I chose option 6. I used also this script (usertools/dpdk-setup.sh ) for setup of the environment for DPDK such as [6] arm64-thunderx-linuxapp-gcc [18] Insert IGB UIO module [19] Insert VFIO module [22] Setup hugepage mappings for NUMA systems [23] Display current Ethernet/Crypto device settings [24] Bind Ethernet/Crypto device to IGB UIO module [25] Bind Ethernet/Crypto device to VFIO module [26] Setup VFIO permissions [29] List hugepage info from /proc/meminfo [30] Unbind devices from IGB UIO or VFIO driver [31] Remove IGB UIO module [32] Remove VFIO module [34] Remove hugepage mappings ....

stanislav-chlebec commented 6 years ago

To comment previous posts about problems with DPDK and VPP in connection with Cavium ThunderX NIC:

  1. prerequisites: stanislav@vppagent:~$ more /etc/default/grub
    GRUB_DEFAULT=0
    #GRUB_HIDDEN_TIMEOUT=0
    GRUB_HIDDEN_TIMEOUT_QUIET=true
    GRUB_TIMEOUT=10
    GRUB_DISTRIBUTOR=Ubuntu
    GRUB_CMDLINE_LINUX="console=ttyAMA0,115200n8 biosdevname=0 net.ifnames=1 crashkernel=auto LANG=en_US.UTF-8 iommu.passthrough=1 hugepagesz=1GB hugepages=16 default_hugepagesz=1GB iomem=relaxed"
    GRUB_TERMINAL=serial
    GRUB_SERIAL_COMMAND="serial"

stanislav@vppagent:~$ more /etc/fstab

UUID=DD02-5841  /boot/efi   vfat    errors=remount-ro   0   2
#UUID=d379045b-2646-494b-9870-bd8eb0bdbc8d  none    swap    none    0   0
UUID=e50fa6e2-6cf8-4e2b-8397-ef2f8c534172   /   ext4    errors=remount-ro   0   1
nodev /mnt/huge_1GB hugetlbfs pagesize=1GB 0 0
#nodev /run/vpp/hugepages hugetlbfs pagesize=1GB 0 0

stanislav@vppagent:~$ more /etc/modules

#bonding
#uio_pci_generic
vfio-pci

stanislav@vppagent:~$ more /etc/network/interfaces

#https://help.packet.net/technical/networking/layer-2-configurations
#Disabling bond0, putting eth0 to a single VLAN that has an internet gateway, and putting eth1 in a different, private VLAN. ***Currently Disabled

auto lo
iface lo inet loopback

#auto bond0
#iface bond0 inet static
#    address 147.75.98.202
#    netmask 255.255.255.252
#    gateway 147.75.98.201
#    bond-downdelay 200
#    bond-miimon 100
#    bond-mode 4
#    bond-updelay 200
#    bond-xmit_hash_policy layer3+4
#    bond-lacp-rate 1
#    bond-slaves enP2p1s0f1 enP2p1s0f2
#    dns-nameservers 147.75.207.207 147.75.207.208
#iface bond0 inet6 static
#    address 2604:1380:1:5800::1
#    netmask 127
#    gateway 2604:1380:1:5800::
#
#auto bond0:0
#iface bond0:0 inet static
#    address 10.99.164.1
#    netmask 255.255.255.254
#    post-up route add -net 10.0.0.0/8 gw 10.99.164.0
#    post-down route del -net 10.0.0.0/8 gw 10.99.164.0
#
#auto enP2p1s0f1
#iface enP2p1s0f1 inet manual
#    bond-master bond0
#
#auto enP2p1s0f2
#iface enP2p1s0f2 inet manual
#    pre-up sleep 4
#    bond-master bond0

down enP2p1s0f2
iface enP2p1s0f2 inet static
    address 192.168.1.4
    netmask 255.255.255.0

auto enP2p1s0f1
iface enP2p1s0f1 inet static
    address 147.75.98.202
    netmask 255.255.255.252
    gateway 147.75.98.201
  1. Ubuntu 18.04.1 LTS , kernel 4.15.0-20-generic , gcc (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010

    • DPDK http://dpdk.org/git/dpdk branch master commit 9724d127f2d729c7475c4667bc14f95dc17fff8a
    • using dpdk's usertools/dpdk-setup.sh utility I was able to successfully build DPDK for option [6] arm64-thunderx-linuxapp-gcc
    • I was able to bind network device 0002:01:00.2 'THUNDERX Network Interface Controller virtual function a034' to vfio-pci (DPDK-compatible) driver (using tool sudo ./usertools/dpdk-devbind.py --bind=vfio-pci 0002:01:00.2 )
    • I was able to run built testtool in dpdk folder: sudo ./arm64-thunderx-linuxapp-gcc/app/testpmd -c 0xff -n 4 --huge-dir /mnt/huge_1GB --file-prefix vpp -w 0002:01:00.2 --master-lcore 1 --socket-mem 64,64 I found there message: EAL: pci_map_resource(): cannot mmap(40, 0xffff80200000, 0x200000, 0x40000000000): Invalid argument (0xffffffffffffffff)
  2. Ubuntu 18.04.1 LTS , kernel 4.15.0-20-generic , gcc (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010

    • VPP https://gerrit.fd.io/r/vpp, branch master, commit e4a9eb7873f140f88be7fffb83e1215fbf181116
    • I was able to build (via: make build)
    • using tool: sudo ./usertools/dpdk-devbind.py --bind=vfio-pci 0002:01:00.2
    • start vpp (via: make run)
    • I get to VPP cli where I can issue command show interface I see the NIC VirtualFunctionEthernet1/0/2 I am able to set it to state UP (set interface state VirtualFunctionEthernet1/0/2 up)

Issue: when I start VPP with this config file

stanislav@vppagent:~$ more /etc/vpp/contiv-vswitch.conf
unix {
    nodaemon
    cli-listen /run/vpp/cli.sock
    cli-no-pager
    coredump-size unlimited
    full-coredump
    poll-sleep-usec 100
}
nat {
    endpoint-dependent
}
api-trace {
   on
   nitems 500
}
dpdk {
    dev 0002:01:00.2
    uio-driver vfio-pci
}

Then I need to set up this config file for VPP like this (in VPP folder):

stanislav@vppagent:~/work/vpp$ STARTUP_CONF=/etc/vpp/contiv-vswitch.conf
stanislav@vppagent:~/work/vpp$ export STARTUP_CONF
stanislav@vppagent:~/work/vpp$ make run

Then I got the message /home/stanislav/work/vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[6330]: dpdk: Unsupported PCI device 0x177d:0xa034 found at PCI address 0002:01:00.2

This did not prevent me to use the device VirtualFunctionEthernet1/0/2 in VPP

3. Finally I tried to enable Kubernetes - successfully See https://github.com/stanislav-chlebec/vpp/blob/master/docs/MANUAL_INSTALL_CAVIUM.md Still it is WORK IN PROGRESS I only report this to announce that some obstacles were overcome Please wait for next news. Thanks

stanislav-chlebec commented 6 years ago

https://hub.docker.com/r/contivvpp/vswitch-arm64/ For now the latest version is reverted to the older build v1.2-alpha-171-g3f83604f The reason: the newest vswitch's container is crashing in kubernetes.... See https://jira.fd.io/browse/VPP-1394 The same policy is applied to other ARM64 contivvpp images.

stanislav-chlebec commented 6 years ago

https://hub.docker.com/r/contivvpp/vswitch-arm64/ For now the latest version is reverted to the older build v1.2-alpha-171-g3f83604f The reason: the newest vswitch's container is crashing in kubernetes.... See https://jira.fd.io/browse/VPP-1394 The same policy is applied to other ARM64 contivvpp images.

stanislav-chlebec commented 6 years ago

https://hub.docker.com/r/contivvpp/vswitch-arm64/ Versions from v1.3-alpha-155-g490f9d619-de4660a are patched by patch https://gerrit.fd.io/r/#/c/14714/ It works now properly.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vielmetti commented 4 years ago

Reopening with a request to migrate to different hardware.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.