-
### Version
1.21
### Describe the bug.
I used the code from tutorial to train ImageNet (https://github.com/NVIDIA/DALI/blob/main/docs/examples/use_cases/pytorch/resnet50/main.py) , I have six…
twmht updated
3 weeks ago
-
Hi, I want train your model on multiple gpus. But I am getting errors. Can you help me in this regard?
-
File "/data4/azuryl/DoRA/commonsense_reasoning/finetune.py", line 410, in
[rank0]: fire.Fire(train)
[rank0]: File "/home/azuryl/anaconda3/envs/dora/lib/python3.10/site-packages/fire/core.py"…
-
How to train with multiple GPUs
-
### Model Series
Qwen2.5
### What are the models used?
Qwen2.5-7B
### What is the scenario where the problem happened?
transformers
### Is this a known issue?
- [X] I have followed [the GitHub …
-
HI, Prof. Luo. Thanks for your excelent waork. Does the training process only work for a single GPU? As you can see, I got many errors when train with multi-GPU.
-
Does the project support multi-gpu training?
If yes, how? By default, it only uses one GPU. I am unable to find any parameter that can be used for this purpose.
Snimm updated
2 months ago
-
Hi Dr. Liu, so nice work.
When I open the ddp to True in emage.yaml:
`ddp: False` to `ddp: True`
and modified "--gpus" in config.py:
`parser.add("--gpus", default=[0], type=int, nargs="*")` to `…
-
How do I set up multi-GPU parallel training?
-
I noticed in your reply that you consumed 39G with A100 training, I used four 4090 GPUs for training but still got an error showing Out of memory, I wonder if you could provide a version for multi-GPU…