Open philschmid opened 1 year ago
Hi @philschmid, thank you for letting us know. We will investigate.
Hi @philschmid . We are debugging your issue. In the meantime could you try a couple of alternatives:
Set XLA_USE_BF16=1 in environment instead of inside the class AugmentTrainerForTrainiumMixin in Optimum-Neuron's trainers.py.
Using PyTorch autocast instead of XLA_USE_BF16=1 with the following changes to the class AugmentTrainerForTrainiumMixin in Optimum-Neuron's trainers.py:
if training_args is not None:
if training_args.bf16:
torch.cuda.is_bf16_supported = lambda: True
os.environ["NEURON_RT_STOCHASTIC_ROUNDING_EN"] = "1"
# training_args.bf16 = False
# os.environ["XLA_USE_BF16"] = "1"
For example:
XLA_USE_BF16=true torchrun --nproc_per_node=2 train.py --model_id bert-base-uncased --dataset_path lm_dataset --lr 2e-5 --per_device_train_batch_size 8 --epochs 3
...........
{'eval_loss': 1.2047382593154907, 'eval_f1': 0.8204884578328744, 'eval_runtime': 11.9553, 'eval_samples_per_second': 257.626, 'eval_steps_per_second': 16.143, 'epoch': 3.0}
Thank you we will try that but 0.8204884578328744
is 10% worse than what GPUs get with BF16.
Hello,
We created an example on how to fine-tune BERT on the
Banking77
dataset, which has 77 labels, which works totally fine and achieves andf1
score of 0.84 (which is still 9% lower than one GPU) but when we activatebf16
thef1
score drops to0.02
and it is completely garbage. Similar to this the train loss is not decreasing.How to reproduce.
torchrun
command and addbf16
parameter.