francoishernandez commented 2 weeks ago

bfloat16

X = steps

X = relative time

It seems to work relatively plug-n-play, but we might need to adapt a few things optimizer-wise:

fusedadam does not seem supported;
loss lags a bit behind compared to fp16/fp32

We might investigate some bf16-specific implementations, e.g. https://github.com/arogozhnikov/adamw_bfloat16

precision // model_dtype homogenization

Previously, model_dtype is used for training, with some "precision" deduced and applied depending on some other settings (optimizer), and precision is set in PredictConfig for inference. This PR proposes a factorization of precision at the common RunningConfig level, and dtype (actual dtype the model is cast to for training),is deduced with the same conditions as before.

TODOs:

[x] check refactoring did not break inference;
[x] clarify int8 specific case handling (-> done via dtype computed_field);
[x] investigate bf16 optimization;
[x] add some validation if needed (e.g. fusedadam + bf16 incompatibility)
[x] add some docs/FAQ page with various precision/dtype related specificities?

francoishernandez commented 2 weeks ago

93158fe enables amp for the bfloat16 case, which seems to work fine.

francoishernandez commented 2 weeks ago

TODO

[x] rename precision to compute_dtype
[x] rename dtype to storage_dtype (or model_dtype?)

vince62s commented 2 weeks ago

for xlm-roberta-xl(xxl) which are natively fp32, I added this here: https://github.com/eole-nlp/eole/blob/166a18b272fb927334d109c3aa8f6e4aedf39f72/eole/bin/convert/convert_HF.py#L861 to convert them to fp16 I think since we can convert any kind of model (more and more are in bf16) maybe by default we can keep the original dtype but we can add a flag to force the storage in another dtype.

eole-nlp / eole

bfloat16 support, and an attempt at homogenizing model_dtype & precision #54

bfloat16

precision // model_dtype homogenization

TODOs:

TODO