Closed weigao266 closed 1 year ago
Ah, we need to update the docs. Can you try the ngoyal_bf16_changes
branch for fairscale?
Ah, we need to update the docs. Can you try the
ngoyal_bf16_changes
branch for fairscale?
Thanks, I have tried the ngoyal_bf16_changes
branch for fairscale, but still got the same error.
--bf16 use BF16 format Currently --bf16 is an added argument with --fp16 for mixed precision bf16 training or with --memory-efficient-fp16 for pure bf16 training.
you may need either --fp16
or --memory-efficient-fp16
--bf16 use BF16 format Currently --bf16 is an added argument with --fp16 for mixed precision bf16 training or with --memory-efficient-fp16 for pure bf16 training.
you may need either
--fp16
or--memory-efficient-fp16
It works. Thanks!
🐛 Bug
I am training a normal transformer model on multiple GPUs with metaseq-train,
--ddp-backend=fully_sharded
and--bf16
both work well individually, but thay are incompatible with each other. Is this expected for some reason? or just not support yet?To Reproduce
Steps to reproduce the behavior (always include the command you ran):
When I follow the suggestion to use
--ddp-backend=legacy_ddp
, everything goes well. But this is not what I want.Expected behavior
Expect the
--ddp-backend=fully_sharded
and--bf16
work well together.Environment
pip
, source): pip