-
When I run this example [runs on multiple gpus using Distributed Data Parallel (DDP) training](https://docs.lightly.ai/self-supervised-learning/examples/simclr.html) on AWS SageMaker with 4 GPUS and …
-
Hi, I was wondering if there were any efforts on great.py natively supporting Distributed Data Parallels? Currently I am doing a workaround by editing my own trainer file and saving it via torch save.…
-
In my understanding, in pretrain code, it broadcasts the data from tp rank 0 to the rest tp rank gpus.
However, if i activate the option `train_valid_test_datasets_provider.is_distributed = True` wh…
-
In Chapel its possible to think you are writing a distributed parallel loop but end up creating something that runs locally. The following code sample demonstrates this:
```chapel
use BlockDist;
…
-
I believe that a useful feature would be to implement a wrapper for the pytorch distributed data parallel layer.
My personal motivation for this is to be able to use things like synchronized batch…
-
While fine tuning VCR task in Distributed Data Parallel mode, it hangs when loading model to gpu.
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_manual_with_data_parallel_dp_type_DDP_ScheduleClass0_use_new_run…
-
I'm a pytorch and mxnet user and `Flux` looks promising to me. I have 8 GPUs on the server and I want to train my model faster. Unfortunately, I see no document about parallel training on multiple GP…
-
## 🐛 Bug
During training a translation model, the evaluation failed under distributed data parallel mode.
### To Reproduce
Steps to reproduce the behavior (**always include the command you ra…
-
### Feature request / 功能建议
The current Dataloader implementation in this repository is underperforming due to a lack of efficient parallelization. This often results in the CPU handling data preproc…