facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.
Other
3.17k stars 279 forks source link

Support BF16 for FSDP #963

Open yuvalkirstain opened 2 years ago

yuvalkirstain commented 2 years ago

Feature Request

Please support BF16 mixed-precision

Additional context

Training with BF16 is usually more stable than fp16, which is very important when we want to train large models. Additionally, many models (e.g. T5) are trained with BF16 and if we want to continue training them with mixed-precision, using fp16 will result in NaNs.

Thank you!

anj-s commented 2 years ago

Thank you for this issue! We are currently working on adding support for bf16 and hope to have it done very soon :)

Assuming that you meant support bf16 with FSDP? Or were you thinking of another API?

yuvalkirstain commented 2 years ago

Exactly, bf16 with FSDP!

yuvalkirstain commented 2 years ago

@anj-s please let me know if there is anything we can do to help, having support for bf16 with FSDP in Fairseq will really really help us! :)

yuvalkirstain commented 2 years ago

Hi, has there been any progress with resolving this issue? @anj-s Thank you so much

anj-s commented 2 years ago

Hi, has there been any progress with resolving this issue? @anj-s Thank you so much

Hi @yuvalkirstain, I think this should work without any issues. Can you try using bfloat16 by passing the right compute_dtype argument when using FSDP? Unfortunately i haven't had a chance to add a unit test but perhaps someone else on the team has looked into this. cc @anupambhatnagar @min-xu-ai

wangleiofficial commented 2 years ago

bfloat16 support with pytorch lighting will be better, do you have this consideration?

toriving commented 2 years ago

Is there currently any progress on this issue? Or I'm just wondering if it works if I just apply the above branch.

anupambhatnagar commented 2 years ago

There has been no progress on this so far.