Liuhong99 / Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
MIT License
937 stars 54 forks source link

Availability of models? #21

Closed ArthurConmy closed 1 year ago

ArthurConmy commented 1 year ago

Hey! I'm fascinated by work finding competitors to Adam, particularly since from an interoperability perspective it may have some strange properties, such as probably causing outlier large dimensions in the residual stream.

Do you have access to any of the Language Models (such as GPT-2 Small sized models) that you trained, to investigate this?

Liuhong99 commented 1 year ago

Hi! We have the plan to release all model checkpoints, (although it will take some time to figure out the license.)