-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### 💡 Your Question
So when training all seems fine but during training after a few epochs the training stops and i get an error looking like this:
torch.distributed.elastic.multiprocessing.api.Si…
-
## 🐛 Bug
Hi, we are using lightning with litdata on our local machine and aws s3 system. However, training would hang randomly during the very first iterations with ddp and remote cloud directory.
…
-
Thank you for your fantastic work!
I'm curious, why do you only use single GPU training in this example? Is it possible to train your model with multiple GPUs using PyTorch DDP?
-
1. Remove the words "YES" and "NO" from product titles because of the sick evaluation process! or using
> `return logits[:, 1][-1:], gold[-1:]`
in function preprocess_logits_for_metrics…
-
### Describe the bug
Hello, training XTTSv2 leads to weird training lags with using DDP - training gets stuck with no errors
x6 RTX a6000 and 512GB RAM
Here is monitoring GPU load graph. Purple -…
-
Thank you very much for providing the code about DDP, it is a good job. But I was using Resnet50 for monocular + stereoscopic training and found that the training results disappeared. Could you please…
-
### Feature request
Enable use of IterableDataset when training with NeuronTrainer and DDP. Or is there a design limitation that prevents this?
I can't share the project code, but see below anot…
-
### Feature request
DDP support for xpu like cuda, trainer automatically take multi cuda devices with the help of accelerate.
Trainer should be able to use detect and use multiple xpu devices by def…