Open Vindicator645 opened 1 month ago
I suspect the loss = loss / gradient_accumulation_steps and acc = acc / gradient_accumulation_steps should be removed in deepspeed_utils
try this: model_engine.backward(loss)
if (step + 1) % model_engine.gradient_accumulation_steps() == 0: model_engine.step() model_engine.zero_grad()
try this: model_engine.backward(loss)
if (step + 1) % model_engine.gradient_accumulation_steps() == 0: model_engine.step() model_engine.zero_grad()
sorry, remove gradient_accumulation_steps is enough
System Info
Nvidia A100
Information
🐛 Describe the bug
When training a model with asr_librispeech script, i get a loss around 8 initially, with ddp i get around 8 with gradient accumulate as well; but when using deepspeed, with gradient accumulate=1 initial loss is 8, but with gradient accumulate=10 the loss value is 0.8; setting gradient accumulate in ds_config does nothing, setting gradient_accumulation_steps=10000 takes the same time as gradient_accumulation_steps=1
Error logs
loss=8 for gradient_accumulation_steps=1 and loss=0.8 gradient_accumulation_steps=10
Expected behavior
loss should on the same maginitute reguardless gradient_accumulation_steps