amzn / amzn-drivers

Official AWS drivers repository for Elastic Network Adapter (ENA) and Elastic Fabric Adapter (EFA)
453 stars 175 forks source link

ENA/XDP metadata support #265

Open mejedi opened 1 year ago

mejedi commented 1 year ago

xdp_buff has a metadata area, which an XDP eBPF program can use to stash some info (using bpf_xdp_adjust_meta and direct packet access to write data). This information is transferred to sk_buff, and could be accessed by a TC eBPF program. This mechanism enables communication between XDP and TC programs.

Unfortunately, ENA driver doesn't support attaching and transferring metadata.

bpf_xdp_adjust_meta fails with ENOTSUPP when invoked in ENA XDP path.

EMnify would love to see this feature implemented.

davidarinzon commented 1 year ago

Hi @mejedi

Thank you for raising this request, we will look into it. Would be happy to hear more about your use-cases / challenges and how you're planning to use this feature going forward (on top of what's written). You're also welcome to contact me at darinzon@amazon.com.

mejedi commented 1 year ago

Hi @davidarinzon,

As of today, we had a single use of metadata. Our routing logic is implemented in XDP. We rely on the kernel's ARP handling/neighbour cache. When the destination MAC address is in the cache, we are able to fully handle a packet in XDP (bpf_fib_lookup to obtain MAC addresses).

However, when the destination MAC is not yet known, we have to trigger an ARP probe to learn it eventually. It is not possible from XDP, therefore we XDP_PASS to the host stack. We were passing the egress interface index, selected according to the app logic, via metadata (there are multiple interfaces in use). TC program extracted the egress interface index, and did bpf_redirect_neigh.

It worked beautifully in testing, but failed with the real ENA. Quite unexpectedly, metadata support turned out to be optional :(

A workaround would revolve around passing the data inline; the tricky bit is picking an encoding that doesn't legitimately occur on the wire. One option would be picking a distinct ethertype, but it doesn't quite work. Another option is using a funny destination MAC address. Overall, it looks needlessly complicated and inelegant though.

man bpf-helpers provides further insights on the use-cases:

The use of xdp_md->data_meta is optional and programs are not required to use it. The rationale is that when the packet is processed with XDP (e.g. as DoS filter), it is possible to push further meta data along with it before passing to the stack, and to give the guarantee that an ingress eBPF program attached as a TC classifier on the same device can pick this up for further post-processing. Since TC works with socket buffers, it remains possible to set from XDP the mark or priority pointers, or other pointers for the socket buffer. Having this scratch space generic and programmable allows for more flexibility as the user is free to store whatever meta data they need.

davidarinzon commented 1 year ago

Hi @mejedi Thank you for sharing your use-case in detail. We will look into this support and understand the gaps.

zeffron commented 1 year ago

We had a patch locally to enable metadata support. It was as easy setting the second parameter in this line to true. The ENA driver already has sufficient space overhead for XDP metadata.

mejedi commented 1 year ago

@zeffron

We had a patch locally to enable metadata support. It was as easy setting the second parameter in this line to true. The ENA driver already has sufficient space overhead for XDP metadata.

Are you sure? It looks like metadata start should be copied to skb to take advantage of it on TC side, but apparently it is not.

zeffron commented 1 year ago

Ah, we only use XDP, and we made our patch a while ago, so you're probably correct that more needs to be done to copy the metadata to the SKB.

derlaft commented 2 months ago

We had a patch locally to enable metadata support. It was as easy setting the second parameter in this line to true. The ENA driver already has sufficient space overhead for XDP metadata.

The link is now pointing to the wrong place. Here's probably the line that was linked back then: https://github.com/amzn/amzn-drivers/blob/ac09615dd3e7a74529807d1728e0ace31dd22e7b/kernel/linux/ena/ena_netdev.c#L1387

It looks like metadata start should be copied to skb to take advantage of it on TC side, but apparently it is not.

Nevertheless, even without copying to tc being able to use ctx->data_meta for communicating between tail-called xdp programs is already a big improvement.