erpc-io / eRPC

Efficient RPCs for datacenter networks
https://erpc.io/
Other
835 stars 137 forks source link

Enabling eRPC use without hugepages #23

Open vsbenas opened 5 years ago

vsbenas commented 5 years ago

Our machines have 2 NUMA nodes, but only one is connected to the network. Hence, running eRPC on half of the cores is efficient, but the other half experience significant performance issues.

This pull request is inspired by the HERD architecture to use heap memory, when numa_node is set to -1. https://github.com/efficient/rdma_bench/blob/master/libhrd/hrd_conn.c#L117

It is now possible to set numa_node as erpc::kNoNumaNode on the Nexus constructor so that NUMA memory is not used on eRPC. Obviously this ends up being slightly slower for when such nodes are available, but in our configuration this achieved a 20.6% 23.1% performance increase, so I believe it's a good option to have for eRPC.

TODO:

anujkaliaiitd commented 5 years ago

Thank you for the pull request!

Is it good enough to instead allow each Rpc object to choose its NUMA node, instead of inheriting the Nexus's NUMA node? That can be easily done by adding an optional argument to the Rpc constructor.

The ability to work without hugepages is nice, but I would like to avoid the additional complexity unless we really need it.

vsbenas commented 5 years ago

That depends if Rpc objects register the node's memory with the NIC. I could not create a Nexus using the second numa node Failed to register mr. It is not connected to the network.

I don't know if such a scenario is at all common (two nodes, only one on the network), but the performance is much better using regular memory in our case. So "do we need it" really depends on how common such setup is.

About the complexity, it adds an extra branch inside the ~HugeAlloc() loop and one branch in HugeAlloc(). In terms of performance it should be negligible, but I understand that the code becomes more cluttered.

anujkaliaiitd commented 5 years ago

Thanks for the details. It's common to have only one CPU connected to the network, so it's important that eRPC works in this setting.

I'm unsure why registration fails with the second NUMA node. I'll look at this over the next few days.