Liuhong99 / Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
MIT License
937 stars 54 forks source link

Use nn.GELU for GELU. Runs a bit faster #40

Open attesaarela opened 1 year ago

attesaarela commented 1 year ago

Use nn.GELU for computing GELU. Makes the optimization run a couple percent faster.