how to use command "accelerate config" to generate the /home/duser/.cache/huggingface/accelerate/default_config.yaml.
I have try the two configuration that could run accelrate launch train.py successfully. but seems the model parallel training not successsfully. So how to configure the accelerate.
(1)
Which type of machine are you using? This machine
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Do you wish to optimize your script with torch dynamo?[yes/NO]:No
Do you want to use DeepSpeed? [yes/NO]: No
Do you want to use FullyShardedDataParallel? [yes/NO]: NO^H^H
Please enter yes or no.
Do you want to use FullyShardedDataParallel? [yes/NO]:
Do you want to use Megatron-LM ? [yes/NO]: NO
How many GPU(s) should be used for distributed training? [1]:5
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:1,2,3,4,5
Do you wish to use FP16 or BF16 (mixed precision)?
bf16
accelerate configuration saved at /home/duser/.cache/huggingface/accelerate/default_config.yaml
(2)
-In which compute environment are you running?
This machine
Which type of machine are you using?
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Do you wish to optimize your script with torch dynamo?[yes/NO]:n^HNO
Please enter yes or no.
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
Do you want to use FullyShardedDataParallel? [yes/NO]: yes
What should be your sharding strategy?
FULL_SHARD
Do you want to offload parameters and gradients to CPU? [yes/NO]: yes
What should be your auto wrap policy?
NO_WRAP
-What should be your FSDP's backward prefetch policy?
BACKWARD_PRE
What should be your FSDP's state dict type?
FULL_STATE_DICT
How many GPU(s) should be used for distributed training? [1]:5
Do you wish to use FP16 or BF16 (mixed precision)?
no
accelerate configuration saved at /home/duser/.cache/huggingface/accelerate/default_config.yaml
how to use command "accelerate config" to generate the /home/duser/.cache/huggingface/accelerate/default_config.yaml.
I have try the two configuration that could run accelrate launch train.py successfully. but seems the model parallel training not successsfully. So how to configure the accelerate.
(1)
Which type of machine are you using? This machine multi-GPU How many different machines will you use (use more than 1 for multi-node training)? [1]: Do you wish to optimize your script with torch dynamo?[yes/NO]:No Do you want to use DeepSpeed? [yes/NO]: No Do you want to use FullyShardedDataParallel? [yes/NO]: NO^H^H Please enter yes or no. Do you want to use FullyShardedDataParallel? [yes/NO]: Do you want to use Megatron-LM ? [yes/NO]: NO How many GPU(s) should be used for distributed training? [1]:5 What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:1,2,3,4,5 Do you wish to use FP16 or BF16 (mixed precision)? bf16 accelerate configuration saved at /home/duser/.cache/huggingface/accelerate/default_config.yaml
(2)
-In which compute environment are you running? This machine Which type of machine are you using? multi-GPU How many different machines will you use (use more than 1 for multi-node training)? [1]: Do you wish to optimize your script with torch dynamo?[yes/NO]:n^HNO Please enter yes or no. Do you wish to optimize your script with torch dynamo?[yes/NO]:NO Do you want to use DeepSpeed? [yes/NO]: NO Do you want to use FullyShardedDataParallel? [yes/NO]: yes What should be your sharding strategy? FULL_SHARD Do you want to offload parameters and gradients to CPU? [yes/NO]: yes What should be your auto wrap policy? NO_WRAP -What should be your FSDP's backward prefetch policy? BACKWARD_PRE What should be your FSDP's state dict type? FULL_STATE_DICT How many GPU(s) should be used for distributed training? [1]:5 Do you wish to use FP16 or BF16 (mixed precision)? no accelerate configuration saved at /home/duser/.cache/huggingface/accelerate/default_config.yaml