/deepspeed/runtime/fp16/loss_scaler.py", line 175, in update_scale
raise Exception(
Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.
raise Exception(
Exception: Current loss scale already at minimum - cannot decrease scale anymore. Exiting run.
作者您好,想请教两个问题:
fp16下为何无法正常训练,大概在几千次step之后,就报这个错,loss益处。