issues
search
Liuhong99
/
Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
MIT License
938
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Is it applicable for any loss function?
#3
subercui
closed
1 year ago
2
Evaluation on other domains
#2
francqz31
closed
1 year ago
1
Trying to backward through the graph a second time
#1
snykral
closed
1 year ago
2
Previous