issues
search
NVIDIA
/
nccl
Optimized primitives for collective multi-GPU communication
Other
3.27k
stars
826
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
local access violation work queue error when upgrade to v2.20.3-1
#1524
gangxie112
opened
13 hours ago
7
Questions about the FIFO of simple protocol
#1523
JK-Jiagn
opened
1 day ago
0
Any possibility/plan to support fused kernels?
#1522
dearsxx0918
opened
1 day ago
0
Why group calls (`ncclGroupStart()` and `ncclGroupEnd()`) are invoked in `ncclSend()` and `ncclRecv()`
#1521
ZhiyiHu1999
opened
4 days ago
0
Is it safe or recommended to use multiple communicators for real distributed training
#1520
ZhiyiHu1999
opened
1 week ago
0
Unable to use multiple NICs
#1519
thecodingwizard
opened
1 week ago
4
Problem with SHMEM creation on startup.
#1518
davies-w
closed
1 week ago
4
torch.distributed.DistBackendError: NCCL error
#1517
Chevolier
opened
1 week ago
1
Could not enable P2P between devices
#1516
ZhiyiHu1999
opened
1 week ago
1
Nccl socketStartConnect: Connect to x.x.x.x<xxxx> failed : Software caused connection abort
#1515
913871734
opened
1 week ago
4
torch.distributed.DistBackendError: NCCL error in ProcessGroupNCCL.cpp:1275
#1514
shenshaowei
opened
1 week ago
1
nccl capture error
#1513
freshduer
opened
1 week ago
0
How does nccl regtister the memory region by ibv_reg_xxxx ?
#1512
jinhao2
closed
1 week ago
1
nccl-test Indicates a performance problem
#1511
yalbaba
opened
2 weeks ago
13
GPU Direct RDMA Disabled for HCA
#1510
hiennguyennq
opened
2 weeks ago
3
NCCL hung witih NCCL_P2P_USE_CUDA_MEMCPY=1 by pytorch
#1509
adofirst2018
opened
2 weeks ago
5
what is the count value in allreduce function?
#1508
BSkim26
opened
2 weeks ago
0
align the peer mem access check with API document
#1507
changchengx
opened
2 weeks ago
3
allgather performance using NVLS is poor
#1506
telala
opened
2 weeks ago
13
Question on DGX A100 GPU topologies
#1505
YJHMITWEB
closed
2 weeks ago
2
Encounter NCCL error when runing Pytorch example code
#1504
Noblezhong
opened
3 weeks ago
5
Difference between readLL() and readLLFinish() in prims_ll.h
#1503
ZhiyiHu1999
opened
3 weeks ago
0
nccl topo about PHB and NODE
#1502
jianzi123
opened
3 weeks ago
1
Makefile: Add nccl_common.h and nccl_tuner.h to INCEXPORTS
#1501
martin-belanger
closed
3 weeks ago
4
Can NCCL and model forward propagation (CUDA matrix operations) be executed simultaneously, and if so, how can this be achieved?
#1500
liweiqing1997
opened
3 weeks ago
0
ncclInternalError: Internal check failed
#1499
whiteyn
opened
3 weeks ago
3
Question about ncclTopoConnectNodes in topo.cc
#1498
JK-Jiagn
closed
3 weeks ago
2
same data all reduce on H20, but results are different
#1497
Rainlin007
opened
4 weeks ago
10
result of sendrecv_perf is wrong
#1496
yanminjia
opened
4 weeks ago
1
P2P in non-blocking mode
#1495
kwen2501
opened
1 month ago
3
Alternating rings cause bad performance (NIC sending PFC) in a cluster with mixed crossNic=0/1 nodes
#1494
huzhiwen93
opened
1 month ago
0
A question about sequences of functions called in nccl/src/transport /net.cc
#1493
ZhiyiHu1999
opened
1 month ago
0
NCCL Logs Questions
#1492
gjit-juniper
opened
1 month ago
0
Question about tree channel
#1491
networkResearcher
opened
1 month ago
0
Question about the topology of double binary tree
#1490
networkResearcher
opened
1 month ago
0
Fix comm ready check in ncclIbIsend and ncclIbIrecv
#1489
WWeiOne
opened
1 month ago
1
nvlsAllocateMem report error
#1488
shanleo2024
closed
4 weeks ago
7
1000x latency with `ncclSend` and `ncclRecv`
#1487
goelayu
closed
1 month ago
10
NCCL WARN socketProgress: Connection closed by remote peer
#1486
ganyu1992
opened
1 month ago
2
The internal errors caused by host-side multi-threads
#1485
themoonstone
opened
1 month ago
0
How to use the profiler plugin?
#1484
jxh314
opened
1 month ago
1
what is the nvidia IMEX channels in Multi-Node NVLink (MNNVL)
#1483
vvmex
opened
1 month ago
1
The reason for breakdown "NCCL WARN comm 0x55ec10690820 has already been destroyed"
#1482
chenhongyu2048
closed
1 month ago
2
what will happen if i call ncclAbort after ncclSend
#1481
freshduer
opened
1 month ago
0
The nsys profile will hang when NCCL_P2P_USE_CUDA_MEMCPY is enabled
#1480
PhdShi
opened
1 month ago
5
question about PXN
#1479
anchenchai
opened
1 month ago
0
Some question about NCCL_IB_SCA
#1478
shanleo2024
closed
1 month ago
2
How to interpret NCCL's NVTX data
#1477
martin-belanger
closed
1 month ago
1
Question about hostStream, deviceStream and userStream
#1476
MC952-arch
opened
1 month ago
0
Does FabricManager support isolation at GPU card granularity in MNNVL envrionmant?
#1475
hailiyidishui
opened
1 month ago
1
Next