CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
112 stars 50 forks source link

Add XDP hardware offload support for certain Mizar functions with Netronome NIC #658

Closed yangpenger closed 1 year ago

yangpenger commented 2 years ago

What type of PR is this?

/kind design

What this PR does / why we need it:

This PR can offload some functions in trn_transit_xdp.c on Netronome NIC.

Which issue(s) this PR fixes:

Not fixes any issue.

Special notes for your reviewer:

The deployment of offloaded XDP only works when detecting a supported interface. Otherwise, offloaded XDP will not be deployed and Mizar works as normal. To make no effect on Mizar, we only add codes and do not change any logic of Mizar.

Does this PR introduce a user-facing change?:

In deploy.mizar.yaml, we support one env variable named "OFFLOAD_XDP" for offloaded XDP on and off.

vinaykul commented 2 years ago

Please look into the unit test failures. https://github.com/CentaurusInfra/mizar/runs/7871759634?check_suite_focus=true

You can run 'make unittest' and then run './build/tests/test_dmn' to repro the issue and investigate failures.

yangpenger commented 2 years ago

It is our first time contributing to an open source project, so some stupid mistakes may exist in our codes. Anyway, thanks for the review @vinaykul . I have addressed some problems, but I think we need discussions for some details in Mizar's Weekly Open Source Meeting if necessary.

vinaykul commented 2 years ago

Thanks for the updates. As discussed in the community meeting yesterday, I'll look forward to fixes and updates that will hopefully be able to get rid of the new CLI extensions and use the flag Phu has added to load the new XDP offload program on Netronome NIC while running the current transit XDP program on the same NIC in generic mode. I'll review the code from the top once we have those changes.

vinaykul commented 1 year ago

It would also be helpful to capture in the PR description the different functions that are being offloaded. It is not clear whether bouncer or divider or both of those are being offloaded. Could you also please add the reasons for the prefix length in the offloaded code used in the offload map lookup keys? Could you also please add a list of all the limitations we encountered with Netronome in the issue description?

I think we also need an E2E test for this code.

vinaykul commented 1 year ago

Could we also get performance numbers of the offloaded eBPF code vs generic eBPF code?

vinaykul commented 1 year ago

/hold

vinaykul commented 1 year ago

LGTM We can merge this pending Phu's LGTM

xieus commented 1 year ago

@yangpenger Can you update your PR to address Phu's comments, mostly minor? We would like to move this forward quickly. Thanks.

xieus commented 1 year ago

@vinaykul @phudtran Pls take another quick look and merge if all good :-)

phudtran commented 1 year ago

Thanks LGTM.