EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Apache License 2.0
2.16k stars 156 forks source link

The value of weight decay #132

Closed yehuitang closed 8 months ago

yehuitang commented 8 months ago

Thanks for you interesting work! What is the value of weight decay when training the model. I note that it is 0.1 in the github (https://github.com/EleutherAI/pythia/blob/afec006b936a38131fc6c6d2b0a48425dd97bc6e/models/1B/pythia-1b.yml) but 0.01 in the paper (Table 6 in Appendix E). https://arxiv.org/pdf/2304.01373.pdf

Thanks!

haileyschoelkopf commented 8 months ago

Hi, this is an error in the paper, apologies--we used 0.1 for weight decay, matching the config files in this repo.