A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases, and capable of utilizing different hardware options with no code changes required. Docs: https://pytorch-accelerated.readthedocs.io/en/latest/
Apache License 2.0
162
stars
21
forks
source link
Distributed training results in slow convergence #59
Hi,
I am using the sample code for timm model training. There is a mismatch when am accelerating the code with GPU and otherwise. What can be the reason for this
There are 3 results in the image
baseline_batch-32 is the values by just doing python train.py
baseline_batch-32_nodist is the results of using accelerate config `accelerate_config_nodist.yaml
baseline_batch-32_1gpu is the results of using accelerate configaccelerate_config_1gpu.yaml`
The config for nodist is
compute_environment: LOCAL_MACHINE distributed_type: 'NO' downcast_bf16: 'no' gpu_ids: '1' machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
The config for 1 gpu is
compute_environment: LOCAL_MACHINE distributed_type: MULTI_GPU downcast_bf16: 'no' gpu_ids: 3, machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
Hi, I am using the sample code for timm model training. There is a mismatch when am accelerating the code with GPU and otherwise. What can be the reason for this
There are 3 results in the image
baseline_batch-32
is the values by just doingpython train.py
baseline_batch-32_nodist
is the results of using accelerate config `accelerate_config_nodist.yamlbaseline_batch-32_1gpu is the results of using accelerate config
accelerate_config_1gpu.yaml`The config for nodist is
compute_environment: LOCAL_MACHINE distributed_type: 'NO' downcast_bf16: 'no' gpu_ids: '1' machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
The config for 1 gpu is
compute_environment: LOCAL_MACHINE distributed_type: MULTI_GPU downcast_bf16: 'no' gpu_ids: 3, machine_rank: 0 main_training_function: main mixed_precision: 'no' num_machines: 1 num_processes: 1 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false