-
I trained yolov4 on my own data set with the param **"-dont_show -map -gpus 1,2"**, and an error occurred while calculating mAP.
We have 2 Tesla K80 (4 GPU in total) and I trained my dataset on GPUs …
-
Hello,
I have 8 * A40(48G) GPUs, so I wanna use them all for training and inferencing.
But I can't find the Multi-GPU things like DataParallel or DistributedDataParallel in train.py code, maybe …
-
# Describe the bug
Hi, @araistrick Thanks for this excellent work! I want to inquire about the speed of video generation with InfiniGen. Under default settings (24fps, 192 frames), which means genera…
-
Hello,
I am currently having a jax program `(p)jitted` and running on 8 devices. I want to scale it up to 32 devices by running 4 replicas of this program (and only do a `lax.pmean` at the end of e…
luyug updated
3 weeks ago
-
## Background information
### What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.1.5
### Describe how Open MPI was installed (e.g., from a so…
-
@RogerChern I train tridentNet_1x with resnet50 on 4 GPU (a machine with 8 GPU), and I need 2 days. Especially, when others use other left GPUs in my machines, the speed of training my models is slowe…
-
Currently, the BERT finetuning scripts for MRPC and SQuAD both only use a single GPU. It would be great to enhance the script so that multiple GPUs can be used to accelerate training.
I am working…
-
**Describe the bug**
Problem with multi gpu training when i remove --single gpu
**Expected behavior**
it detects the available gpus
![image](https://user-images.githubusercontent.com/73253952…
-
**Describe the bug**
Code source: https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/stable-diffusion/test-stable-diffusion.py
**To Reproduce**
I've tried both the l…
dyedd updated
11 months ago
-
### Feature Idea
Allow memory to split across GPUs. With the arrival of Flux, even 24gb cards are maxed out and models have to be swapped in and out in the image creation process, which is slow. If y…