issues
search
NVIDIA
/
nccl
Optimized primitives for collective multi-GPU communication
Other
3.28k
stars
831
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
get rid of some unbounded sprintfs
#1330
madeleineth
closed
3 months ago
0
Why can't two GPUs in a virtual machine communicate using P2P?
#1329
qianxiaoliang
opened
5 months ago
1
NCCL error "receiving 524288 bytes instead of 65536"
#1328
JingyuQian
closed
4 months ago
2
how double binary tree communicate
#1327
ltm920716
closed
4 months ago
4
Why dose theoretical busBw multiply by the ratio 5/6?
#1326
cyqmonkey
opened
5 months ago
0
fix stack smash
#1325
madeleineth
closed
4 months ago
1
How to locate the hanging node?
#1324
Puzzzle7
opened
5 months ago
1
nccl with specified pkey_index
#1323
wengyao04
closed
5 months ago
1
For channel computing, why nvlinkBw is accumulated, but pciBw is not? Is this a BUG?
#1322
GodHforever
opened
5 months ago
2
Has NCCL support inter-node through NVswitch and NVlink?
#1321
shanleo2024
closed
5 months ago
8
Execute all_reduce_perf block
#1320
zhaotyer
closed
5 months ago
1
How sendProxyProgress() in net.cc works
#1319
ZhiyiHu1999
opened
5 months ago
2
How are threads in different channels parallelized
#1318
ZhiyiHu1999
opened
5 months ago
0
NCCL2.21 hangs at cudaLaunchKernelExC()
#1317
leiyi666
opened
5 months ago
6
Performance Degradation in Alltoall Operation with NCCL 2.19 and 2.20
#1316
GeofferyGeng
opened
5 months ago
5
Understanding LL, LL128, and Simple Protocols
#1315
arkhadem
opened
5 months ago
0
Compute time in the reduction operation
#1314
tks2004
opened
5 months ago
0
Some questions about selecting NET when searching channels.
#1313
shanleo2024
closed
5 months ago
12
Local user buffer registration for NVLink SHARP
#1312
zhang662817
opened
6 months ago
1
Added retries to EHOSTUNREACH socket error.
#1311
newellz2
opened
6 months ago
0
Profiling Tools for NCCL collective operations
#1310
Chen-Chang
opened
6 months ago
0
Dual 4090 bandwidth slower with PCIe
#1309
YZP17121579
closed
6 months ago
1
link against rdma-core libs when build with RDMA-CORE
#1308
changchengx
opened
6 months ago
1
nccl-test can use nvidia sharp, but training job can not use nvidia sharp
#1307
Lzhang-hub
closed
6 months ago
0
About NVLS MC/UC buffer
#1306
vvmex
opened
6 months ago
0
why two GPU far than PXB under intel cpu use P2P will be slower(without NVLink)
#1305
HuangShiqing
opened
6 months ago
2
NCCL fallback to Ring,LL on broadcast perf and NCCL_ALGO=Tree
#1304
arttianezhu
opened
6 months ago
1
All Reduce Performance on H100 VMs
#1303
apoorvemohan
opened
6 months ago
1
Why duplicate nChannels in connect.cc
#1302
jxh314
opened
6 months ago
1
Why there are two IDs for MNNVL support?
#1301
dearsxx0918
closed
6 months ago
2
How does collective operations call runRing, runTreeUpDown, and runTreeSplit
#1300
ZhiyiHu1999
opened
6 months ago
1
How is the logic for allocating data across different channels?
#1299
jxh314
closed
1 month ago
1
all-reduce slower on v2.20.5 compared to v2.18.5 on AWS g5.48xlarge (8 x A10G)
#1298
abdulfatir
opened
6 months ago
15
Internal error when submitting a job to a Ray cluster
#1297
troelsfr
opened
6 months ago
3
What's the relationship between nccl protcols and inter-node communication?
#1296
Alex-Wong
opened
6 months ago
0
NCCL_NET_GDR_READ's performance impact on a PCIe platform
#1295
cold2stone
opened
6 months ago
3
Does ncclBroadcast call return at same time on different ranks?
#1294
Eiji911
opened
6 months ago
1
How to understand "bank" in net.cc?
#1293
dearsxx0918
closed
6 months ago
0
Failed to find ncclNetPlugin_v8 symbol
#1292
wwj-2017-1117
opened
6 months ago
4
nccl-tests with two GH200 over Quantum2 iB stuck
#1291
itzsimpl
opened
6 months ago
2
Inquiry about NCCL's Tree Algorithm Performance in Single and Dual Machine Scenarios
#1290
fizzlover
opened
6 months ago
0
NCCL stuck when using nccl-test.
#1289
deepzzz123
opened
6 months ago
3
One of the NODE will hang when NCCL_NET_GDR_READ=1
#1288
shanleo2024
closed
4 months ago
1
How can this be ported to Windows?
#1287
eabase
opened
6 months ago
4
How can I identify level1 nvswitch and level2 nvswitch in NCCL
#1286
Ryan201802
opened
6 months ago
12
AMD EPYC 7K62 NCCL-test 4090 bandwidth too
#1285
ghoul02015
opened
6 months ago
1
an NCCL timeout and the error "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate more than 1EB memory"
#1284
jessiewy
closed
6 months ago
5
RuntimeError: NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details) when torch._C._broadcast_coalesced
#1283
zhoulei-biubiu
opened
6 months ago
2
Why nccl ring all reduce stream duration doesn't scales with theoretical (N-1)/N?
#1282
CraneQinghe
opened
6 months ago
1
Why is allgather's busbw a little worse than allreduce/reducescatter for the same nccl environment variables
#1281
pkuleo
opened
6 months ago
1
Previous
Next