-
-
Would be cool. Kind of hard to test in our situation.
-
Train with 1 node, 8 gpus is ok. But train with 2 nodes, 16 gpus is not ok.
1. train with 2 nodes(a & b), each node has 8 gpus.
2. run with the following shell scripts:./tools/dist_train.sh config…
-
Does thundersvm support multi-GPU training?will multi-GPU make the training faster?
-
I want to run the full.py on multi gpus, but only one GPU was used.
```
Using bfloat16 Automatic Mixed Precision (AMP)
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
-----------------------…
-
I'm using CNTK2.5 C++ v2 library for evaluation.
It works sometime, but it throws `CUDNN_STATUS_INTERNAL_ERROR` at random, if I use multiple GPUs.
`cuDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU…
zakki updated
6 years ago
-
Hi,
1.
When I use the command on 8 gpus:
```
python3 qalora.py --model_path $llama_7b_4bit_g32
```
it will show the error:
```
File "/home/shawn/anaconda3/envs/qalora/lib/python3.8/site-pa…
-
### 🐛 Describe the bug
train command:
xport NGPUS=2
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=9531 ./train.py
but when traininf fow a while ,ra…
-
Hello Everyone,
As I don't have a single GPU with enough VRAM, I thought about modifying the [loader.py](https://github.com/GAIR-NLP/anole/blob/main/chameleon/inference/loader.py) to add accelerate…
-
The llm module is support infer use `vllm` or use multi GPUs?
If not, when will these features be implemented ?