-
Hi all. I encountered some problems when building SwiftTransforemer as the dependency for DistServer.
My GPU is 4090
nvcc: NVIDIA(R)Cuda compiler driver
Copyright(c)2005-2023 NVIDIA corporation
Bu…
-
Happy New Year! NCCL developers and community members.
Recently I am trying to find the upper bound of NCCL allreduce performance in our network environment. I tried various methods and referred t…
-
Hi NCCL team. I have read your blog [doubling all2all performance with nccl 2.12](https://developer.nvidia.com/blog/doubling-all2all-performance-with-nvidia-collective-communication-library-2-12).
…
-
magatron AL/ML training hangs up with error messages as following. ReduceScatter failed to be finished within the timeout (30mins). It is tricky that no error log reported from NCCL. I have no idea ho…
-
Hi Team,
I am running nccl-bw:ib test for H100 cluster using superbench. But the bandwidth we are getting is very less. Like it's around 64Gb/s and it should come around 400 Gb/s. I am currently n…
-
### Bug description
Training freezes when using `ddp` on slurm cluster (`dp` runs as expected). The dataset is loaded via torchdata from an s3 bucket. Similar behaviour also arises when using webda…
-
### 🐛 Describe the bug
when I run [examples/language/gpt/gemini/run_gemini.sh](https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/gemini/run_gemini.sh) scripts base on official…
-
Hi NCCL team,
Looks like there is no official build of v2.5.7-1 for download, at https://developer.nvidia.com/nccl/nccl-download. Do you plan to add it?
-
Hello team,
I noticed you have been updating most of the last tagged releases with the `-aws` suffix. I think I can assist if you need help testing it on other libfabric providers.
We have some …
-
Hi there,
I wanna ask about the performance comparison between int32 and fp16 datatype when using the allreduce API. I am not sure it's normal or not, but the int32 latency is almost 6x larger than…