Open wenyang001 opened 5 months ago
@wenyang001
accelerate config
to set the configuration of multi-GPU training.gradient_accumulate_every
to 8 in the config file, which means that the model's forward function is called 8 times before the parameters are updated. Thus, the real batch size is 2*8.
Hope can help you.Thanks for your reply.
How many GPUs did you train your model? Is your model supported for multi-GPUs training as I saw the bash size of bsds was set to only 2?