-
Hi dear developer,
I have a question about MNNVL and NVLS.
If the whole topo support MNNVL, for example there are totally 5 nodes and each node includes 8 GPU cards. Then the total GPU number is 40, w…
-
hello, I encountered some problems while using this code for multi-gpu training.
first I tried to run it with
"python3 train_dafnet.py --model_name "llama-2-7b" --device 0 --extra_device 1 2 3"
an…
-
i compile the code from source , v1.8.3
the command is : NCCL_DEBUG=INFO NCCL_ALGO=Tree,Ring,,CollnetDirect,CollnetChain,NVLS ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8
and the output is :
N…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I trying to deploy a Qwen2-72b model in k8s, with 4 GPUs in one node. Accroding…
-
I found that there are two test patterns, one for in_place and one for out_of_place, what is the difference between these two, I also found that I need to add an offset when using in_place, why do I n…
-
Hi, I want to test the all_reduce_perf with p2p through PCIe in H20. However, H20 is equipped with nvlink, the NCCL all_reduce_perf always transfers data with the nvlink. How Can I get the p2p with PC…
cll24 updated
2 months ago
-
## Error
When I use 2 GPU to train flux lora, everything is fine, successful training~, but when I use one GPU or start with 2GPU, but use one, it start to have the error bellow,
I tried :
export …
-
The following code, what does the variable `width` mean? Get these `widths` in the code, and finally what are they used for. Thank very much!!!
```c
struct ncclTopoLink {
int type;
float wid…
-
Hi. Thanks for the amazing work. Am trying to run it on windows environment python 3.10 but i couldn't. Am getting this error.... Collecting nvidia-nccl-cu12
Downloading nvidia-nccl-cu12-0.0.1.dev…
Abocg updated
1 month ago
-
Hi !
In your paper, you mentioned that including text-only data in training is crucial for maintaining language abilities. I'm currently performing full fine-tuning using LLaMA Factory, and I'm enc…