issues
search
NVIDIA
/
nccl-tests
NCCL Tests
BSD 3-Clause "New" or "Revised" License
809
stars
229
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Enable P2P on pcie in a nvlink machine
#250
cll24
opened
6 hours ago
0
Getting Avg bus bandwidth = 0 when running all_reduce_perf in nccl-tests in my EC2 G5.8x large
#249
rajeshvenkata
closed
2 days ago
2
Running in kubernetes pods Error
#248
drikster80
closed
1 day ago
2
NCCL all-reduce test failure due to TL_SHM ERROR, This case was happened on containers on same server.
#247
thsmfe001
closed
3 weeks ago
2
NCCL_Algo=Tree
#246
afattaholman
opened
1 month ago
1
What does dma_buf do when gpuDirectRdma is disabled ?
#245
Pavani-Panakanti
opened
1 month ago
1
Test NCCL Hang
#244
sdonoso
closed
1 month ago
2
Enhance Multi-Node NCCL Testing with Torch C10D Gloo Framework
#243
hexinw
opened
1 month ago
0
2 Node Nccl Test don’t work for A100
#242
jeffreyyjp
closed
3 weeks ago
4
AllReduce Bus Bandwidth decreases with larger network latency
#241
chenzhu99
opened
1 month ago
0
doc: add all2all factor
#240
OrenLeung
closed
1 month ago
1
fix: nvls all reduce correction factor
#239
OrenLeung
opened
1 month ago
4
all_reduce algo factor for NVLink SHARP In network reductions
#238
OrenLeung
opened
1 month ago
0
how to calculate the tree based allreduce ib bw?
#237
echobinarybytes
opened
1 month ago
0
2 Node Nccl Test don’t work
#236
SdEnd
opened
1 month ago
7
How do we comprehend the factor between algBw and busBw?
#235
lianghao208
opened
1 month ago
5
What's multi-allreduce ?
#234
ProHuper
opened
1 month ago
1
all_reduce_perf core dumped on 4 L20
#233
songh11
closed
2 weeks ago
23
NCCL Tree allreduce test cannot reach the theoretical bus bandwidth on 2 nodes with 4 nics
#232
ProHuper
closed
1 month ago
0
Test NCCL failure common.cu:997 'internal error
#231
sdonoso
closed
2 months ago
9
what is cu:990 error? how to solve this problem?
#230
MAKER-park
opened
2 months ago
5
2 Nodes nccl-test with mpi hangs
#229
sdonoso
closed
2 months ago
1
has nvswitch, but uses 0 nvls channels
#228
MiyazonoKaori
closed
2 months ago
3
Test fail caused by ibvwrap.c:160 NCCL WARN Call to ibv_modify_qp failed with error Connection timed out.
#227
thsmfe001
closed
2 months ago
2
improve parsing of stepbytes (increment size) argument
#226
StefanoSalsano
closed
1 month ago
1
stepbytes (increment size) argument does not support 1M notation
#225
StefanoSalsano
opened
2 months ago
1
alltoall_perf: each rank is only sending to half of the other ranks
#224
russilwvong
closed
2 months ago
14
mpirun all_reduce_perf hang with multi-device test
#223
913871734
opened
2 months ago
0
NCCL WARN Cannot use cuda/gdr transports as part of specified UCX_TLS
#222
liuxingbo12138
opened
3 months ago
5
how to support One Device per Process?
#221
jiangxiaobin96
closed
2 months ago
4
1 GiB headroom might be too small
#220
Namnamseo
opened
3 months ago
0
Test NCCL failure common.cu:959 'internal error - please report this issue to the NCCL developers / '
#219
Assassin187
opened
3 months ago
9
Rank Assignment Issue under four containers on two different servers.
#218
thsmfe001
closed
3 months ago
8
all_reduce_perf hangs; using a single GPU on a 4GPU machine
#217
isaacgerg
closed
3 months ago
21
NCCL initialization hangs with 4 GPUs, but works with 2 GPUs
#216
mickaelseznec
opened
3 months ago
4
NCCL_ALGO on multi-node and multi-GPU
#215
MajidSalimi
opened
3 months ago
1
SendRecv Time
#214
osayamenja
opened
4 months ago
2
Nccl test seems run seperately on multi nodes
#213
jianh619
closed
4 months ago
6
H100 all reduce performance is poor
#212
liminn
opened
4 months ago
13
undefined reference nccl*
#211
gongyguo
closed
4 months ago
1
Differences problems in performance data of HGX A800 single server N GPUs nccl testing
#210
cloveryyg
opened
4 months ago
0
The network bandwidth in the alltoall_perf test failed to meet expectations
#209
fj1425fj
opened
4 months ago
4
Test NCCL failure common.cu:954 'unhandled cuda error
#208
YingYellow
closed
4 months ago
1
make failed, error -- unsupported GNU version! gcc versions later than 11 are not supported!
#207
jxh314
closed
4 months ago
0
misc/ibvwrap.cc:278 NCCL WARN Call to ibv_reg_mr_iova2 failed with error Cannot allocate memory
#206
jxh314
closed
4 months ago
2
cputime
#205
tks2004
opened
4 months ago
0
Test NCCL failure common.cu:961 'internal error - please report this issue to the NCCL developers / '
#204
a-c-dream
opened
5 months ago
7
Add bisection test
#203
x41lakazam
opened
5 months ago
3
Why getBw don't have access to agg_iters ?
#202
x41lakazam
closed
5 months ago
1
Performance lack of NCCL Test
#201
shengode503
opened
6 months ago
5
Next