-
How to run training on multi gpu? As I can see training runs on single gpu.
-
### Description
Hello,
I am testing multi-node training with three servers, each equipped with different GPUs (H1008, A404, L40S*4).
During the process, I encountered an NCCL error and seek assista…
-
Does this code support multi gpu usage for searching? if not how would one add it?
-
Distributed training, single node, 4 GPUs.
```
...
ep 33 train, step 10, ctc_4 2.259, ctc_8 1.880, ctc 1.840, num_seqs 10, max_size:time 237360, max_size:out-spatial 52, mem_usage:cuda:1 6.6GB, 0…
-
### System Info
4*NVIDIA L20
### Who can help?
_No response_
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially suppor…
-
I want to ask if there is possibility to use the multi-gpus . I tried to enter 0,1 and the option didn't work for the training GUI. If anyone can correct me.
-
[2024-06-12 19:36:07,800] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-12 19:36:09,648] [WARNING] [runner.py:202:fetch_hostfile] Unable to fi…
-
### Before Asking
- [X] I have read the [README](https://github.com/meituan/YOLOv6/blob/main/README.md) carefully. 我已经仔细阅读了README上的操作指引。
- [X] I want to train my custom dataset, and I have read …
-
Hi,
Thanks for releasing the code! I am working on retraining the model using pytorch lightning. It works perfectly when I use a single A100 GPU, but I always get NaN loss when using multiple GPUs.…
-
I have adapted this using simple Data-Parallel from Pytorch, but the model seems to output ``nans sometimes. Have you been able to train this across multiple GPUs on a single node?