nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

2U1/Llama3.2-Vision-Finetune #14

Time out at NCCL work:909

![b642c6cb-0ea5-4f69-8fc9-a82dfaed1616](https://github.com/user-attachments/assets/b0c57787-e777-4e58-94ff-d834c2ef8273)

a370701524 updated 6 days ago
5
NVIDIA/nccl #1509

NCCL hung witih NCCL_P2P_USE_CUDA_MEMCPY=1 by pytorch

Hi NCCL version: v2.21.5 when I set NCCL_P2P_USE_CUDA_MEMCPY=1，train a resnet model using pytorch with two GPUs in same NUMA， NCCL will hung，pytorch timeout crash pytorch error： `[rank1]:[E Proces…

adofirst2018 updated 22 hours ago
4
ray-project/ray #48247

[CG] Refactor nccl to communicator channel.

### Description See https://github.com/ray-project/ray/pull/47845 RFC doc: https://docs.google.com/document/d/1zu9SllrEAjPHqs-eeITtrSSbv0rBxtkyCJeweZJl100/edit?usp=sharing ### Use case _No resp…

Bye-legumes updated 1 week ago
2
openucx/ucc #106

NCCL TL

In PR #84 we are adding support for NCCL TL. If UCC was built with NCCL support TL NCCL might be selected by CLs for CUDA collectives i.e. when both source and destination buffers are of memory type C…

Sergei-Lebedev updated 3 weeks ago
2
NVIDIA/nccl #1511

nccl-test Indicates a performance problem

Scheduling two containers on the same node results in significantly lower performance for nccl-test than scheduling two containers on different nodes An experiment to schedule two containers to diffe…

yalbaba updated 18 hours ago
4
ray-project/ray #47540

[aDAG] Support experimental_compile(_custom_nccl_group= nccl…

### Description See https://github.com/ray-project/ray/pull/47141#discussion_r1747392605 ### Use case _No response_

ruisearch42 updated 1 month ago
5
NVIDIA/nccl-tests #263

Test CUDA failure common.cu:941 'invalid device ordinal' whe…

I compile nccl-tests with the command: ```shell make MPI=1 MPI_HOME=${NVHPC_ROOT}/comm_libs/12.4/hpcx/hpcx-2.19/ompi NCCL_HOME=${NVHPC_ROOT}/comm_libs/nccl CUDA_HOME=${NVHPC_ROOT}/cuda ``` And run th…

heya5 updated 6 days ago
3
fundamentalvision/BEVFormer #230

NCCL Error on WSL2

When I am running both train and test of the model on single GPU (./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 1), I am getting this error: ``` RuntimeError: N…

samueleruffino99 updated 3 weeks ago
3
NVIDIA/nccl-tests #257

nccl-tests did not perform as expected

My test environment is two nodes, each node is equipped with eight Gpus respectively, and the gpu model is A800-SXM4-80GB, as shown in the following figure： node1 ![Image](https://github.com/user-atta…

yalbaba updated 1 week ago
3
NVIDIA/nccl #1470

NCCL Collective Log Query

Hello, I have been going through the logging functionality in NCCL and wanted to know if there is a way to determine the global ranks of the devices that are involved in a collective operation. Curr…

gjit-juniper updated 3 weeks ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for nccl

1000+ results
for nccl