issues
search
NVIDIA
/
nccl
Optimized primitives for collective multi-GPU communication
Other
3.27k
stars
829
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
net_ib: return ncclSuccess if read roceTypePath failed and errno is E…
#1427
limu713
opened
2 months ago
3
[Question] Why are thread affinities not set for nccl proxy threads?
#1426
joerowell
closed
2 months ago
3
[SHARP] Aggregation Manager Fails: Local Port validation failed
#1425
nariaki3551
closed
2 months ago
1
[Question] NCCL kernels occupy the full SM?
#1424
chenhongyu2048
closed
2 months ago
5
Not all gpus have nvlinks, the communication data is all incorrect
#1423
zhaowujin
opened
2 months ago
4
How does NCCL deal with Virtual Functions
#1422
Linhagupta
closed
2 months ago
4
does nccl support cuda graph for scale out scenary
#1421
yejunguo
opened
2 months ago
0
delete duplicated code in sendProxySetup() in net.cc
#1420
kais-nvidia
opened
2 months ago
0
prterun noticed that process rank 48 with PID 613162 on node gpu-node07 exited on signal 11 (Segmentation fault).
#1419
safeAndSound3
opened
3 months ago
2
Does NCCL use compute resources when in GDRDMA mode?
#1418
Sere-Fu
opened
3 months ago
0
[Bug] For AllReduce communications with a shape mismatch, some cases will hang while others will not.
#1417
YanjieGao
opened
3 months ago
5
Is point to point halo communication 'null peer' possible?
#1416
huccpp
opened
3 months ago
2
How Does NCCL Select Topologies for Different Collective Operations During Machine Learning Training
#1415
KirilDan
opened
3 months ago
1
Overlapping DtoH with NCCL
#1414
htzho
opened
3 months ago
0
Improve ReduceScatter
#1413
GeofferyGeng
closed
3 months ago
2
NCCL Infiniband Issue
#1412
JuiceLemonLemon
closed
3 months ago
7
Why NCCL_CROSS_NIC option can improve the bandwidth of ring allreduce?
#1411
ProHuper
opened
3 months ago
0
NCCL ncclUnhandledCudaError: Call to CUDA function failed
#1410
xiejibing
opened
3 months ago
3
NCCL timeout issue
#1409
wd255
opened
3 months ago
5
No binary build for 2.22.3 for NCCL on PyPi Cuda11
#1408
Skylion007
opened
3 months ago
1
Performance Degradation in Multi-Process vs. Multi-Threaded Execution of NCCL Tests on 8 H800 GPUs
#1407
polarstormx
closed
3 months ago
3
There is something mismatch on ncclTopoTrimSystem and ncclTopoCompute
#1406
shanleo2024
closed
3 months ago
4
NCCL failure caused by NET/IB completion error
#1405
thomasbarrett
opened
3 months ago
4
Why does NCCL pass a pointer rather than `struct ncclDevKernelArgs` itself to `ncclKernelMain`?
#1404
YconquestY
closed
3 months ago
2
Allreduce bus bandwidth is very low and unstable when ECE (enhanced connection establishment) is enabled.
#1403
sandyhouse
opened
3 months ago
4
Which path will be choosen with the Specific TOPO?
#1402
shanleo2024
closed
3 months ago
4
Documentation: default of NCCL_IB_SPLIT_DATA_ON_QPS is wrong
#1401
y1r
closed
3 months ago
1
NCCL all-reduce test failure due to TL_SHM ERROR
#1400
thsmfe001
opened
3 months ago
0
How NCCL utilizes shared memory with the dynamic tensor shape varies across training iterations?
#1399
szhengac
opened
3 months ago
7
how to Improve VLLM KVCACHE Transfer Efficiency with NCCL P2P Communication
#1398
liweiqing1997
opened
3 months ago
0
Why choose 20.6 as Hopper GPU’s nvlink bandwith?
#1397
polarstormx
closed
2 days ago
1
[Question] Is SendRecv always block GPU?
#1396
vincentccc
opened
3 months ago
1
How I can modify the source code to change the send data size to 16K in IB verbs?
#1395
shanleo2024
opened
3 months ago
0
Why different shape of tensor can be all reduced when using nccl as backend?
#1394
yjzhong89
opened
3 months ago
0
Why only flush once using the last non-zero receive?
#1393
clearsky07
opened
3 months ago
0
Issues with Limited HCA Utilization and RDMA in Multi-node Training
#1392
asdfry
closed
3 months ago
7
Why NCCL LL128 proto need to load data twice?
#1391
MARD1NO
opened
3 months ago
0
Could anyone provide some suggestions to help me optimize my NCCL code for transmitting KV cache to improve performance?
#1390
liweiqing1997
opened
3 months ago
0
Will ncclSend, ncclRecv launched in different cuda streams blocking each other?
#1389
billwu01
closed
3 months ago
1
RuntimeError: NCCL error: internal error - please report this issue to the NCCL developers
#1388
emmanuelrajapandian
opened
3 months ago
4
add ncclLastResult to report library error
#1387
ganyu1992
closed
3 months ago
0
Is there any option to use copy engine in ncclSend and ncclRecv ?
#1386
umiswing
opened
4 months ago
0
Data transfer from shared buffer to network
#1385
ZhiyiHu1999
closed
1 month ago
0
Some questions about how NCCL uses IB network for data transmission
#1384
clearsky07
opened
4 months ago
0
Fix the issue of printing data overflow.
#1383
wangfakang
opened
4 months ago
1
NCCL with WARN socketTryAccept: Accept failed: Bad file descriptor
#1382
syyxsxx
opened
4 months ago
5
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1691, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.19.3 ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. Last error:
#1381
1moye
opened
4 months ago
0
Does NCCL support DOCA GPUNetIO?
#1380
songhexiang
opened
4 months ago
0
question about balanced tree
#1379
Vikram111-pix
opened
4 months ago
0
Low bandwidth of AllReduce over long-range connection with high latancy (0.25ms)
#1378
yanminjia
opened
4 months ago
0
Previous
Next