Closed wukuser closed 3 years ago
Hi,
Can you give some more information about your hardware configurations? For example, which NIC you are using, link type (RoCE or Infiniband(IB)). Are two nodes correctly connected with RDMA? As mentioned in #5, currently Wukong has only been tested on IB not RoCE.
Thanks!
Hi,
Thank you for your reply.
The links are indeed Infiniband and the machines use Mellanox ConnectX-4 adapters. Note that in order to avoid an error at startup, I had to add in $WUKONG_ROOT/deps/openmpi-1.6.5-install/share/openmpi/mca-btl-openib-device-params.ini
:
[Mellanox ConnectX4]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 4115
use_eager_rdma = 1
mtu = 4096
max_inline_data = 256
Dear all,
I'm trying to run Wukong with RDMA enabled, but I'm getting this error at loading time:
It seems that others had this issue:
Answers in issue #5 mention using a new version of libRMDA, but I can't access it. Answers in issue #15 mention changing ports or adding
SO_REUSEPORT
tosetsockopt()
, but I tried both and I'm still getting the same error.Any ideas?