Fix the issue of parameters updated as nan during reward model training.

CambioML / pykoi-rlhf-finetuned-transformers

pykoi: Active learning in one unified interface

https://www.cambioml.com

Apache License 2.0

407 stars 43 forks source link

Fix the issue of parameters updated as nan during reward model training. #69

Closed llauraa23 closed 1 year ago

llauraa23 commented 1 year ago

Language model is loaded in torch.float16. Adam optimizer adds epsilon to avoid zero denominator. Note, torch.float16 will round any number smaller than 6e-8 to 0. Do not change epsilon to smaller than 6e-8.

CambioML commented 1 year ago

LGTM! 👍