-
**Some software versions:**
nccl test : 2.13.9
openmpi: 4.1.5
rdma ofed: 23.10-1.1.9.0
nvidia-dirver: 535.104.12-1
cuda: 11.4.4-1
nccl: 2.21.5-1
**Command**
mpirun --allow-run-as-root -…
-
I am new to Soft-RoCE, and I encountered troubles when I worked with it.
I want to use MPI with Soft-RoCE .
I want to know that if there is a MPI interface in softroce and where is the interface.…
-
There were issues with disappearing MOAB jobs registered by ROCED.
- Since there are three showq requests of ROCED with a short time delays, a disappearing of jobs can happen. This should be solve…
-
I've synthesized the design and ran the hardware on AU-250. It works properly and shows me that I have **RDMA Enabled 1** in the `dmesg` logs and provides me with a MAC and IP.
When I try and run t…
-
### Describe the bug
I am not sure this is a UCX bug. Hopefully someone can give me ideas about next steps.
I have two identical servers with BCM57414 hardware. They are connected by a direct-at…
-
Had to modify IB dashboards:
```
# default - doesn't work
irate(node_infiniband_port_packets_transmitted_total{job=~"$job",instance=~"$instance",device=~"$device"}[60s])
# works:
irate(node_i…
-
I recently had an issue with adding an interface in a separate network namespace. The setting is as follows:
I have 2 servers, running Ubuntu 16.04 with kernel version 4.14. Each server has 4 network…
-
When a Ethernet device is added, the RDMA CM default RoCE mode is not set by the rxe_cfg script. Therefore before the use of the RDMA CM for that device is possible, the rxe_cfg script must be used to…
-
...nefungují. Někdy jsou tam jen šipky, někdy se mezi nimi objeví jednička. Když na něco z toho kliknu, tak se někdy něco změní, někdy ne. Jako uživatel mám pocit že to vůbec nemám pod kontrolou a nev…
zabak updated
1 month ago
-
Hello
Currently, our client company is supporting nccl-test.
We are supporting it by writing the script below.
mpirun -np 300 -N 1 -x NCCL_DEBUG=INFO --hostfile /nccl/hostfile \
-mca plm_rsh_no_…