-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-mmlab/mmcv/issues) and [Discussions](https://github.com/open-mmlab/mmcv/discussions) but cannot get the expected help.
- [X]…
-
Wanted to make an issue for this instead of constantly asking in discord.
I saw the other ticket for multigpu fp16 training which is also nice. But ddp would let users scale up training that can happ…
-
## 🚀 Feature
[Documentation says](https://lightning.ai/docs/pytorch/latest/advanced/compile.html#limitations) that torch compile is not supported over distributed training right now. Since torch co…
-
When training with DDP it stuck on validation.
Any suggestions?
-
added `devices="auto"` in `train.py` to utilize multiple gpus
```
trainer: Trainer = hydra.utils.instantiate(cfg.trainer,
callbacks=callbacks,
…
-
### Bug description
![Screenshot 2024-11-16 201845](https://github.com/user-attachments/assets/b134f148-cdc3-435d-94cf-25aa117e103e)
i initialized my trainer
```
trainer = L.Trainer(max_epochs=5…
-
i'm using the webdataset in ddp training. everything works fine when i set the num_workers 0. but if num_workers > 0,the total steps of an epoch was wrong.
```python
dat = Webdataset(url,8000,2,Tru…
-
### System Info
Nvidia A100
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### 🐛 Describe the bug
When training a model with asr_librispeech script, i get a lo…
-
This is a follow-up to #913
# Motivation
Add full support for multi-process and multi-GPU training in alf with pytorch's [DDP](https://pytorch.org/docs/stable/notes/ddp.html).
# Goals
- […
-
### Description & Motivation
For below example, model is being compiled, `DDPStrategy` is passed to Trainer, then during fit method `DDPStrategy` is being applied, so forward is compiled but `_pre_…