nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/nccl #990

Can different GPU non-blocking streams be used to do communi…

Hello, I want to use two GPU non-blocking streams for communication and cuMemcpyAsync respectively to accelerate. GPU: V100 32GB NCCL:NCCL version 2.13.4+cuda11.7 and I use IB. I mean does nccl use …

raninbowlalala updated 1 year ago
1
pinokiofactory/whisper-webui #1

Application fails to start due to incompatible cuDNN version…

Hello, I encountered an error while trying to install and run the script on Linux. Script ran fine but when I tried to start the app I got an error message. ``` Traceback (most recent call last…

yusufipk updated 2 weeks ago
2
THUDM/VisualGLM-6B #125

哪位大佬帮忙看看，微调出现维度不一致问题； RuntimeError: The size of tensor a (25…

(base) root@6633711ec9b0:/home/data/VisualGLM-6B# bash finetune/finetune_visualglm_qlora.sh NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 deepspeed --master_port 16666 --include localhost:0 …

xxuyyuan updated 2 months ago
21
NVIDIA/nccl #353

Info on collectives plugin

I am working on a plugin to use a different algorithm for allreduce. While I have been able to understand most of the code required, I still have a few questions: 1) I defined my plugin and run the…

AmedeoSapio updated 12 months ago
13
Azure/azureml-examples #675

train/pytorch/cifar-distributed not working

[2021-08-17T22:56:28.664111] Starting Linux command : python train.py --epochs 1 --data-dir /mnt/batch/tasks/shared/LS_root/jobs/opendatasetspmworkspace/azureml/6215701e-b1ef-42d0-91d1-864583d0db…

ManojBableshwar updated 3 years ago
2
NVIDIA/nccl #875

ProcessGroupNCCL.cpp:1191

Details: Traceback (most recent call last): File "/gf3/home/lei/zhenghao/Autoplanner/test/manual_pp/pipeline2x4_ptip.py", line 178, in run_stage() File "/gf3/home/lei/zhenghao/Autoplanner…

ZhengH-git updated 3 months ago
7
NVIDIA/cutlass #1919

[BUG] Cutlass python does not detect GPU

**Describe the bug** I am trying to use Cutlass Python and build it from source. My environment is formed by Ubuntu 18.04, cuda 11.8, GPU Nvidia Tesla V100 volta, python3.10, make 3.19 and GCC versio…

IzanCatalan updated 1 week ago
5
PaddlePaddle/PaddleX #1820

docker_paddlex3.0beta 目标识别报错！！在微信和直播都提问了！

1. 安装官网教程测试图像分类没有任何问题，自己测试目标识别出现问题，执行代码如下： ![微信图片_20240718103926](https://github.com/user-attachments/assets/7151813e-43dc-45c0-bdaa-b0b2dc0221a3) 加载dockers： docker run --name paddlex -v /model_p…

fightingshao updated 3 months ago
2
aws-neuron/aws-neuron-sdk #998

Collective Permute Long Tail on trn1.32xlarge

I am launching nccl.collective_permute on a trn1.32xlarge. Within the workload, each neuron core sends data to neighboring worker following a pre-specified topology. However, some of the workers exper…

zhdllwyc updated 5 days ago
5
jax-ml/jax #17399

Intercepted XLA runtime error .

### Description The first time I encountered this error was run mult-node. Then after I run another code, single node also encountered this problem which was ok before. I think this error has s…

Nightbringers updated 11 months ago
1

上一页 1...85 86 87 88 89 90 91...100 下一页

1000+ results for nccl

1000+ results
for nccl