WireGuard / wireguard-vyatta-ubnt

WireGuard for Ubiquiti Devices
https://www.wireguard.com/
GNU General Public License v3.0
1.45k stars 68 forks source link

Hardware offload or general crypto optimizations for Cavium Octeon #120

Closed mzpqnxow closed 2 years ago

mzpqnxow commented 2 years ago

Hello there,

Some time back I began using Wireguard as both a server and client on a small fleet of Ubiquiti EdgeRouter devices, specifically the ones with the Cavium Octeon MIPS64 chip- which is practically all of them, save one or two models

I did a little bit of reading as I was curious about what options there were for acceleration, to either reduce CPU load, improve throughput, or (ideally) both

I read this (very old) UBNT forum post which you participated in. To jog your memory, here was your comment:

I haven't even begun optimizing for the EdgeRouter's architecture. I'll need to write MIPS64 primitives and maybe even figure out how to utilize the offloading chip. The EdgeRouter kernel does not have CONFIG_PADATA, which means we're stuck to one CPU per flow, instead of nicely parallelizing encryption across all CPUs. I'll be able to get that aspect sorted eventually though. Completely unoptimized on my ERL3, I get around 80 mb/s, which isn't bad for a first run. But it's nowhere near the performance it should be getting and eventually will be getting. This benchmark will only get faster, of course.

I noticed a UBNT developer/rep replied essentially offering to enable various features in their kernel configuration if it facilitated this work, which seemed encouraging

I'll get to the point now :)

  1. Has there been any work done since that time to substantially optimize for Octeon platforms?
  2. Are you seriously planning any such work?
  3. If you are planning to look further into this, are there blockers (e.g. UBNT not playing along with their kernel build config) or is it just the usual case of no time to allocate?

The way I understand things, there are three ways to go about optimizing:

Obviously, utilizing available hardware features is ideal, since it can reduce the load on the CPU. I'm not deeply familiar with what the Octeon offers, but I do know that there is a crypto co-processor, and that the UBNT devices offload NAT and "plain" packet routing (as well as IPSEC and VLANs) to hardware

Any thoughts/comments are appreciated. I'm happy to sponsor the effort, but the sum of money I would offer (~$500) is probably a bit of a joke compared to the cost of the time to do this. Regardless, it's offered. I'm also happy to go through all of the prior work and reference manuals to cut down on the "grunt" work ;)

I'm self-interested here because of the UBNT devices I'm responsible for, but I know Cavium has a presence in a lot of the larger enterprise devices (layer-7 firewalls, VPNs, etc.) so maybe there's some value there. Though in that case it would be nice for one of those vendors to sponsor the work...

Thanks!

mzpqnxow commented 2 years ago

Also- I realize that the work for this would probably be done under the wireguard-linux project, but it seemed most appropriate to open the issue here first

mzpqnxow commented 2 years ago

Tagging @zx2c4 though I believe @Lochnair is the maintainer of this repo

zx2c4 commented 2 years ago

There is already https://git.zx2c4.com/wireguard-linux/tree/arch/mips/crypto/poly1305-mips.pl

Feel free to additionally port https://git.zx2c4.com/wireguard-linux/tree/arch/mips/crypto/chacha-core.S to mips64.

mzpqnxow commented 2 years ago

Thank you! They told me MIPS asm would be a useless skill... :P

zx2c4 commented 2 years ago

Well, increasingly arcane, but not quite useless yet. Odd MIPS knowledge floating around in my head has helped me out in all sorts of unforeseen ways over the years... It's also fun to write.