erpc-io / eRPC

Efficient RPCs for datacenter networks
https://erpc.io/
Other
835 stars 137 forks source link

Use only one dpdk segment to send not first packet #89

Closed cxz66666 closed 1 year ago

cxz66666 commented 1 year ago

I'm learning eRPC and found following:

https://github.com/erpc-io/eRPC/blob/994ff3895b18e9a5ace5a3f4df37827529806f3f/src/transport_impl/dpdk/dpdk_transport_datapath.cc#L56-L74

But why not use only one segment, like:

memcpy(rte_pktmbuf_mtod(tx_mbufs[i], uint8_t *), pkthdr,
             sizeof(pkthdr_t));
memcpy(rte_pktmbuf_mtod_offset(tx_mbufs[i], uint8_t *,sizeof(pkthdr_t)),
             &msg_buffer->buf_[item.pkt_idx_ * kMaxDataPerPkt],
             pkt_size - sizeof(pkthdr_t));

Is there any consideration for this design? Thanks!

anujkaliaiitd commented 1 year ago

Hi Chen. Thanks for pointing this out. I'd agree that the existing approach in eRPC is sub-optimal, and the single-segment approach would perform better. If you can submit a patch after testing it out (e.g., with the large_rpc_tput benchmark), I'd be happy to merge it.

To explain why I wrote the code as it is: I wrote the Raw transport for eRPC first, which uses Mellanox ibverbs directly for Ethernet packet IO. With the ibverbs interface, the two segments can be transmitted without any memory copies (see https://github.com/erpc-io/eRPC/blob/5c2343b4968a2f740b730fd29100f3acf9c1ff69/src/transport_impl/raw/raw_transport_datapath.cc#L41), whereas coalescing them into one segment requires copying. This is covered in Section 4.2.1 of our NSDI paper. When I wrote the DPDK transport, I mostly just copied code from RawTransport, retaining the two-segment implementation.

A zero-copy TX approach might be possible even with DPDK by using rte_pktmbuf_attach_extbuf(), after registering the hugepage memory used by eRPC with rte_extmem_register().

cxz66666 commented 1 year ago

Thanks for your reply! I'm a freshman to DPDK, so I don't take a look for rte_pktmbuf_attach_extbuf or rte_extmem_register. But I have take a benchmark with the large_rpc_tput benchmark, the result is VERY inspiring! 😄

Here is my test result, based on dpdk, large_rpc_tput test,1 client thread and 1 server thread.

req_size/speed(Gb/s) before after
2k 18.37 19.84
4k 23.54 26.69
8k 26.67 30.40

I will open a PR later and thank you for reply again.

And by the way, could you have time to update example file in /apps folders, much of them can't be complier correctly 😢(because of naming method) .