-
Es wäre interressant Benchmarks von DDP-Code zu haben, insbesondere von Code der viel mit Listen arbeitet.
Das würde es ermöglichen herauszufinden welche Stellen im Kompilierer/in der Listen Implem…
bafto updated
3 months ago
-
The release `Spider-Man.3.2007.1080p.BluRay.DDP.5.1.x264-GROUP` is not correctly being parsed.
```
[
"release" => "Spider-Man.3.2007.1080p.BluRay.DDP.5.1.x264-GROUP",
"title" => "Spi…
-
## Instructions To Reproduce the Issue:
to speedup training, I add torch.compile operation after DistributedDataParallel in detectron2/engine/defaults.py:
```
ddp = DistributedDataParallel(mo…
-
### Describe the bug
Hello, training XTTSv2 leads to weird training lags with using DDP - training gets stuck with no errors
x6 RTX a6000 and 512GB RAM
Here is monitoring GPU load graph. Purple -…
-
-
### 🐛 Describe the bug
Running torch.func transforms (tested vmap and jacfwd) on a DDP module with buffer broadcasting causes an exception.
```python
import torch …
-
### 🐛 Describe the bug
Repro:
`pytest -rA test/distributed/_composable/test_replicate_with_compiler.py::DDP_TP_Test::test_ddp_tp`
Error:
```
File "/data/users/willfeng/pytorch_top/torch/_dy…
-
训练脚本
```
--model_type minicpm-v-v2_5-chat \
--model_id_or_path /data/MiniCPM-V/pretrained/MiniCPM-Llama3-V-2_5 \
--dataset /data/swift/finetune/train_0703.jsonl \
--ddp_find_unused_pa…
-
Wanted to make an issue for this instead of constantly asking in discord.
I saw the other ticket for multigpu fp16 training which is also nice. But ddp would let users scale up training that can happ…
-
hi, when I run the training code, I met the following error. Can you give me some advice?
` File "/ssd5/exec/liyj/miniconda3/envs/seamless/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 1…