-
When I use multi-GPU training, I encounter the following problem:
subprocess.CalledProcessError: Command '['/home/a/anaconda3/envs/mambayolo/bin/python', '-m', 'torch.distributed.run', '--nproc_per…
wk565 updated
2 weeks ago
-
why write here `parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')`,if i want to use DDP, should i change to 0
-
Hi @chrockey, great work!
Can you guide me on how to set up multigpu training? I have only 20GB gpus available, and when using batch size of 2 I obtain poor performance (~6% lower mIoU and mAcc; pr…
-
I am trying to train the CausalVAE on my own dataset on 4 gpus but all memory is just used by device 0. Is distributed processing not incorporated into the training code?
-
Same here. I was pretraining LlaMA-3.1-7B-Instruct done, and then continue to finetuning w/ QLoRA normally. After 2 epochs, I switched to use Unsloth to continue the finetuning with longer context (80…
-
### 🚀 The feature, motivation and pitch
## Motivation: Limitation of Existing Profiling Approach
To conduct PyTorch distributed training performance analysis, currently a recommended way is profil…
wayi1 updated
2 weeks ago
-
我使用两台机器,每台机器4张显卡。运行命令:accelerate launch --dynamo_backend no --machine_rank 0 --main_process_ip 192.168.68.249 --main_process_port 27828 --mixed_precision no --multi_gpu --num_machines 2 --num_processe…
-
I was trying to finetuning Meta-Llama-3-8B-Instruct using 4 gpus with the following command:
`torchrun --nproc_per_node 4 -m training.run --output_dir llama3test --model_name_or_path meta-llama/Met…
-
### Bug description
The DDP training stuck at the 1st iter, and it's always waiting for pid:
![image](https://github.com/user-attachments/assets/e85d5e39-a24e-41e0-8bea-bcaa004a3473)
os.waitpid()…
-
Hi, I'm using the tutorial [https://github.com/pytorch/tutorials/blob/master/intermediate_source/ddp_tutorial.rst](url) for DDP train,using 4 gpus in myself code, reference Basic Use Case. But when I …