issues
search
NVIDIA
/
nccl
Optimized primitives for collective multi-GPU communication
Other
3.26k
stars
827
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Why group calls (`ncclGroupStart()` and `ncclGroupEnd()`) are invoked in `ncclSend()` and `ncclRecv()`
#1521
ZhiyiHu1999
opened
20 hours ago
0
Is it safe or recommended to use multiple communicators for real distributed training
#1520
ZhiyiHu1999
opened
3 days ago
0
Unable to use multiple NICs
#1519
thecodingwizard
opened
3 days ago
4
Problem with SHMEM creation on startup.
#1518
davies-w
closed
3 days ago
4
torch.distributed.DistBackendError: NCCL error
#1517
Chevolier
opened
4 days ago
1
Could not enable P2P between devices
#1516
ZhiyiHu1999
opened
4 days ago
1
Nccl socketStartConnect: Connect to x.x.x.x<xxxx> failed : Software caused connection abort
#1515
913871734
opened
6 days ago
3
torch.distributed.DistBackendError: NCCL error in ProcessGroupNCCL.cpp:1275
#1514
shenshaowei
opened
1 week ago
1
nccl capture error
#1513
freshduer
opened
1 week ago
0
How does nccl regtister the memory region by ibv_reg_xxxx ?
#1512
jinhao2
closed
4 days ago
1
nccl-test Indicates a performance problem
#1511
yalbaba
opened
1 week ago
13
GPU Direct RDMA Disabled for HCA
#1510
hiennguyennq
opened
1 week ago
3
NCCL hung witih NCCL_P2P_USE_CUDA_MEMCPY=1 by pytorch
#1509
adofirst2018
opened
2 weeks ago
5
what is the count value in allreduce function?
#1508
BSkim26
opened
2 weeks ago
0
align the peer mem access check with API document
#1507
changchengx
opened
2 weeks ago
3
allgather performance using NVLS is poor
#1506
telala
opened
2 weeks ago
13
Question on DGX A100 GPU topologies
#1505
YJHMITWEB
closed
2 weeks ago
2
Encounter NCCL error when runing Pytorch example code
#1504
Noblezhong
opened
2 weeks ago
5
Difference between readLL() and readLLFinish() in prims_ll.h
#1503
ZhiyiHu1999
opened
3 weeks ago
0
nccl topo about PHB and NODE
#1502
jianzi123
opened
3 weeks ago
1
Makefile: Add nccl_common.h and nccl_tuner.h to INCEXPORTS
#1501
martin-belanger
closed
3 weeks ago
4
Can NCCL and model forward propagation (CUDA matrix operations) be executed simultaneously, and if so, how can this be achieved?
#1500
liweiqing1997
opened
3 weeks ago
0
ncclInternalError: Internal check failed
#1499
whiteyn
opened
3 weeks ago
3
Question about ncclTopoConnectNodes in topo.cc
#1498
JK-Jiagn
closed
3 weeks ago
2
same data all reduce on H20, but results are different
#1497
Rainlin007
opened
3 weeks ago
10
result of sendrecv_perf is wrong
#1496
yanminjia
opened
3 weeks ago
1
P2P in non-blocking mode
#1495
kwen2501
opened
3 weeks ago
3
Alternating rings cause bad performance (NIC sending PFC) in a cluster with mixed crossNic=0/1 nodes
#1494
huzhiwen93
opened
4 weeks ago
0
A question about sequences of functions called in nccl/src/transport /net.cc
#1493
ZhiyiHu1999
opened
4 weeks ago
0
NCCL Logs Questions
#1492
gjit-juniper
opened
1 month ago
0
Question about tree channel
#1491
networkResearcher
opened
1 month ago
0
Question about the topology of double binary tree
#1490
networkResearcher
opened
1 month ago
0
Fix comm ready check in ncclIbIsend and ncclIbIrecv
#1489
WWeiOne
opened
1 month ago
1
nvlsAllocateMem report error
#1488
shanleo2024
closed
3 weeks ago
7
1000x latency with `ncclSend` and `ncclRecv`
#1487
goelayu
closed
1 month ago
10
NCCL WARN socketProgress: Connection closed by remote peer
#1486
ganyu1992
opened
1 month ago
2
The internal errors caused by host-side multi-threads
#1485
themoonstone
opened
1 month ago
0
How to use the profiler plugin?
#1484
jxh314
opened
1 month ago
1
what is the nvidia IMEX channels in Multi-Node NVLink (MNNVL)
#1483
vvmex
opened
1 month ago
1
The reason for breakdown "NCCL WARN comm 0x55ec10690820 has already been destroyed"
#1482
chenhongyu2048
closed
1 month ago
2
what will happen if i call ncclAbort after ncclSend
#1481
freshduer
opened
1 month ago
0
The nsys profile will hang when NCCL_P2P_USE_CUDA_MEMCPY is enabled
#1480
PhdShi
opened
1 month ago
5
question about PXN
#1479
anchenchai
opened
1 month ago
0
Some question about NCCL_IB_SCA
#1478
shanleo2024
closed
1 month ago
2
How to interpret NCCL's NVTX data
#1477
martin-belanger
closed
1 month ago
1
Question about hostStream, deviceStream and userStream
#1476
MC952-arch
opened
1 month ago
0
Does FabricManager support isolation at GPU card granularity in MNNVL envrionmant?
#1475
hailiyidishui
opened
1 month ago
1
nvidia-peermem nv_get_p2p_free_callback:127 ERROR detected invalid context, skipping further processing
#1474
hmeScaler
closed
1 month ago
2
Why tree algorithms are specifically targeted at All-Reduce?
#1473
jxh314
opened
1 month ago
1
ncclCommSplit in non-blocking API mode
#1472
kwen2501
opened
1 month ago
6
Next