Closed llauraa23 closed 1 year ago
Language model is loaded in torch.float16. Adam optimizer adds epsilon to avoid zero denominator. Note, torch.float16 will round any number smaller than 6e-8 to 0. Do not change epsilon to smaller than 6e-8.
LGTM! 👍
Language model is loaded in torch.float16. Adam optimizer adds epsilon to avoid zero denominator. Note, torch.float16 will round any number smaller than 6e-8 to 0. Do not change epsilon to smaller than 6e-8.