Open borkmann opened 2 months ago
Small update, with the self-compiled driver we ended up at the following rejection now (which also confirms implicitly that ena on RHEL lacks some backports):
[ 905.417013] ena: loading out-of-tree module taints kernel.
[ 905.417109] ena: module verification failed: signature and/or required key missing - tainting kernel
[ 905.421008] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.12.3g
[ 905.430550] ena 0000:00:05.0: ENA device version: 0.10
[ 905.430552] ena 0000:00:05.0: ENA controller version: 0.0.1 implementation version 1
[ 905.530529] ena 0000:00:05.0: ENA Large LLQ is disabled
[ 905.542476] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at mem c0510000, mac addr 02:f6:cf:1b:b2:59
[ 905.558391] ena 0000:00:05.0 eth0: Local page cache is disabled for less than 16 channels
[ 906.385518] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1160.334921] ena 0000:00:05.0 eth0: Command parameter 46 is not supported
[ 1260.180617] ena 0000:00:05.0 eth0: XDP program is set, changing the max_mtu from 9216 to 3498
[ 1260.875419] ena 0000:00:05.0 eth0: xdp: dropped unsupported multi-buffer packets
So it looks like the Command parameter 46 is not supported
may not have been directly related (?).
Are there plans to support XDP mbuf / multi-buffer for ena? Fwiw, that would lift the requirement to lower the MTU under XDP.
Hi @borkmann
Thank you for you inquiry.
The message “Command parameter 46 is not supported” comes from the function ena_get_rxnfc() and is printed because currently the driver does not support flow steering. It is unrelated to xdp being able to load (as you saw when you compiled the github driver). If you don’t rely on flow steering you can probably ignore it.
I tried loading xdp programs on RHEL 9 with your kernel and got an error (which may be different that what you see in dmesg), so indeed it seems there is an issue with xdp support in the driver that comes with this kernel. We will look into it, thank you for the heads up.
It seems that your issue with this kernel may be different than what I see. It may be helpful if you could share with me:
I can’t answer your question whether it was fixed in upstream linux until I root cause your issue. But the issue I see myself was indeed fixed in upstream linux, and not backported yet to RHEL 9.
Do I understand correctly that you are able to run your xdp program when using the latest github driver that you built yourself on RHEL 9?
As for support for xdp multi-buffer for ena, this is currently under development, it will indeed lift the requirement to lower the MTU under XDP, but I can’t share here the timeline of release.
Arthur
Hi @borkmann,
Another thing. We are aware of an issue with XDP_REDIRECT not currently working on RHEL 9 with the latest github driver. We have a fix and it will be released in the next releases. Meanwhile, if you need XDP_REDIRECT for your testing on RHEL 9, please use the attached workaround patch.
0001-Temporary-fix-for-XDP_REDIRECT-not-working-on-RHEL-9.patch
Arthur
Hi @borkmann,
You've originally attached your dmesg when failing to load the xdp program up to :
[ 3037.066035] ena 0000:00:05.0 eth0: Command parameter 46 is not supported <----------------
Can you please share with me (here or via mail akiyano@amazon.com) what happens in dmesg after that? When I encounter an xdp loading issue with this kernel I have more prints, and I'd like to make sure you are seeing the same issue as I am, so that when I try fixing it I know the fix will also help your case.
Thanks!
Hi @borkmann, We expect this issue to be addressed in the upcoming RHEL 9.5 release that should be released near the end of the 2024. See https://www.redhat.com/en/blog/upcoming-improvements-red-hat-enterprise-linux-minor-release-betas?sc_cid=701f2000000tyBjAAI regarding release schedule.
We expect this issue to be addressed in the upcoming RHEL 9.5 release that should be released near the end of the 2024. See https://www.redhat.com/en/blog/upcoming-improvements-red-hat-enterprise-linux-minor-release-betas?sc_cid=701f2000000tyBjAAI regarding release schedule.
Awesome, thanks so much!
Hi @borkmann,
You've originally attached your dmesg when failing to load the xdp program up to :
[ 3037.066035] ena 0000:00:05.0 eth0: Command parameter 46 is not supported <----------------
Can you please share with me (here or via mail akiyano@amazon.com) what happens in dmesg after that? When I encounter an xdp loading issue with this kernel I have more prints, and I'd like to make sure you are seeing the same issue as I am, so that when I try fixing it I know the fix will also help your case.
Cc'ing @strongjz . We've seen this in dmesg:
[ 905.558391] ena 0000:00:05.0 eth0: Local page cache is disabled for less than 16 channels
[ 906.385518] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 1160.334921] ena 0000:00:05.0 eth0: Command parameter 46 is not supported
[ 1260.180617] ena 0000:00:05.0 eth0: XDP program is set, changing the max_mtu from 9216 to 3498
[ 1260.875419] ena 0000:00:05.0 eth0: xdp: dropped unsupported multi-buffer packets
We basically weren't sure whether the command parameter 46 was related or not which was why we started asking in here. The error message related to XDP was lack of multi-buffer support for XDP.
@borkmann,
To make sure we are on the same page, I'm still not 100% sure what the issue you are seeing with the RHEL 9.3 preinstalled ENA driver that you don't see with the github driver.
There are some known issues that are present up to RHEL 9.4, for which the bug fixes will be backported in RHEL 9.5 but from what you are saying I'm not sure you are experiencing them.
Are you experiencing issues with the driver preinstalled in RHEL 9.3, that are not present in the github driver? What are they?
Regarding your last message:
We've had two other folks run 9.3 default Ena driver fine. The 9.4 for me caused issues. I'm going to test next week with 9.3.
@strongjz Can you please specify what you are running and what the issues are?
Preliminary Actions
Driver Type
Linux kernel driver for Elastic Network Adapter (ENA)
Driver Tag/Commit
5.14.0-427.24.1.el9_4.x86_64
Custom Code
No
OS Platform and Distribution
amazon/RHEL-9.3.0_HVM-20240229-x86_64-27-Hourly2-GP3
And from dmesg:
Bug description
When trying to load our XDP program in Cilium on RHEL9.4, we're running into the following error in dmesg which seems correlated timing-wise:
And no XDP program got loaded on the device itself:
Is the ena driver regularly updated via HW enablement on RHEL9.x? We have users where self-building a driver before use would unfortunately not be an option for production.
We've seen somewhat related issues (#78, amzn/amzn-drivers#241) where this error might hint to XDP.
I hope this is still the right place to ask if it was fixed upstream, perhaps you have a chance to poke Red Hat folks to backport the relevant commits into RHEL9.
Reproduction steps
Expected Behavior
XDP program loads onto ena driver
Actual Behavior
[ 3037.066035] ena 0000:00:05.0 eth0: Command parameter 46 is not supported
Additional Data
No response
Relevant log output
No response
Contact Details
daniel@isovalent.com