-
### Description
I use Ray in an HPC cluster. The cluster has InfiniBand which has low latency and high bandwidth. Ray is based on gRPC and data transferring uses gRPC, too. I can use IPoIB(Internet …
-
### Issue
I am trying to run the EFA EKS NCCL tests, but the public ECR image (public.ecr.aws/w6p6i9i7/aws-efa-nccl-rdma) referenced [here](https://github.com/aws-samples/aws-efa-eks/blob/main/exampl…
-
# LITE Kernel RDMA
Paper Read: LITE Kernel RDMANext week I’ll make a presentation in Advanced Network, a graduate course. Our teacher provided a paper list about Computer Netwo...
[http://localhost:…
-
Is there any interest in support for the above? We've always wanted direct support in moosefs, but now we feel much better about lizardfs upstream, so might be willing to investigate this ourselves.
-
in my test case, i create two connections, A to B, and B to A; then send msg A to B, and B to A, but sometime my process is coredump, or msg send failed;
only on RDMA, tcp is ok.
1. coredump:
Program …
-
For example, `ibv_query_gid_table` and `ibv_query_gid_ex` are introduced after rdma-core v31.0, they are used to get all gids for a device, compared to `ibv_query_gid` on every valid gid index, these …
-
docker.io/dptechnology/unicore:latest-pytorch1.12.1-cuda11.6-rdma
-
the function _prepare_recv_notify_before_using_rdma_write_ should be called before calling _rdma_accept_, otherwise the notification sent by the client may not be received in time, leading to a block …
-
I am running into segmentation faults (SIGSEGV), most likely due to running out of libfabric contexts with SST RDMA transport. I am running a simple reader-writer scenario from your examples where I s…
-
Current head of master:
```c
base/btl_base_am_rdma.c:200:29: warning: invalid use of structure with flexible array member [-Wpedantic]
200 | mca_btl_base_rdma_hdr_t hdr;
| …
rhc54 updated
3 years ago