nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVlabs/NVAE #9

RuntimeError: NCCL error

Hi, I am trying to run NVAE on my machine with your command line for CIFAR10 (updating only the .. from 8 to 4 cause I own 4 GPUs): ``` export EXPR_ID=/home/dsi/eyalbetzalel/NVAE/logs export…

eyalbetzalel updated 3 years ago
2
naibaf7/caffe #61

errors with nccl

When I compile caffe with NCCL, there are erross: src/caffe/parallel.cpp: In instantiation of ‘void caffe::NCCL::Run(const std::vector&, const char*) [with Dtype = float]’: src/caffe/parallel.cpp:37…

oyxhust updated 7 years ago
1
apache/mxnet #11707

test_nccl.test_nccl_pushpull was disabled

The unit test in title have been using fixed seed to mask flakiness. Suggested action: 1. Evaluate whether the test is flaky without fixed seed. If not, remove seed. Else move to 2 2. If test is fla…

szha updated 5 years ago
5
vectorch-ai/ScaleLLM #310

RuntimeError: Timed out

When I run _Meta-Llama-3-8B-Instruct_ or _Meta-Llama-3.1-8B-Instruct_ with 1. python 3.12.5 2. scalellm 0.1.9+cu118torch2.2.2 3. torch 2.2.2+cu1…

spongxin updated 1 month ago
1
dmlc/xgboost #10807

[CI] Upload wheel variants for CUDA 11 and 12

Currently, we build two wheel variants: `xgboost-cpu` (which excludes GPU code) and `xgboost` (where the GPU code targets CUDA 12.4). In #10729, `xgboost` is found to conflict with another package us…

hcho3 updated 1 week ago
1
NVIDIA/nccl #1441

Is there any benchmark of P2P communication between NCCL and…

Including PCI-E, RDMA, TCP/IP and other scenarios, I do not know what kind of test is appropriate.

MoFHeka updated 1 week ago
2
ContextualAI/gritlm #50

save checkpoint with error

set max_steps=500, save_steps=100 When it reaches step 100, the checkpoint is saved successfully but nccl_timeout is displayed

13613979212 updated 3 weeks ago
1
bytedeco/javacpp-presets #1402

[nccl] could we bring nvidia nccl to javacpp pytorch?

HI, also for pytorch distribute training and infer use multiple Nviida Gpu communite message framework, https://github.com/NVIDIA/nccl ,eagerly need it code with javacpp-pytorch,thanks

mullerhai updated 1 year ago
3
NVIDIA/nccl-tests #219

Test NCCL failure common.cu:959 'internal error - please rep…

I am using the `mpirun `command to test the all_reduce_perf file of nccl-tests on two servers within the same local area network. I am able to run other files normally with the `mpirun `command, but w…

Assassin187 updated 3 months ago
9
kubeflow/mpi-operator #639

NCCL tests example

I can add a NCCL tests example but before I do would be great to see if that's something that would be accepted.

samos123 updated 4 months ago
1

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for nccl

1000+ results
for nccl