Closed francoishernandez closed 1 week ago
93158fe enables amp
for the bfloat16
case, which seems to work fine.
precision
to compute_dtype
dtype
to storage_dtype
(or model_dtype
?)for xlm-roberta-xl(xxl) which are natively fp32, I added this here: https://github.com/eole-nlp/eole/blob/166a18b272fb927334d109c3aa8f6e4aedf39f72/eole/bin/convert/convert_HF.py#L861 to convert them to fp16 I think since we can convert any kind of model (more and more are in bf16) maybe by default we can keep the original dtype but we can add a flag to force the storage in another dtype.
bfloat16
X = steps
X = relative time
It seems to work relatively plug-n-play, but we might need to adapt a few things optimizer-wise:
We might investigate some bf16-specific implementations, e.g. https://github.com/arogozhnikov/adamw_bfloat16
precision // model_dtype homogenization
Previously,
model_dtype
is used for training, with some "precision" deduced and applied depending on some other settings (optimizer), andprecision
is set inPredictConfig
for inference. This PR proposes a factorization ofprecision
at the commonRunningConfig
level, anddtype
(actual dtype the model is cast to for training),is deduced with the same conditions as before.TODOs: