issues
search
Liuhong99
/
Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
MIT License
938
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
why did you delete SophiaH?
#53
Andron00e
closed
1 month ago
1
i think it is similar than rmsprop
#52
YooSungHyun
closed
1 month ago
1
Suggested New experiments- GPT2-small w/ Sophia on Fineweb-10B data
#51
sanyalsunny111
opened
3 months ago
1
Does Sophia works with activation-checkpointing?
#50
ahmdtaha
closed
4 months ago
1
Reason for the discrete distribution
#49
dgm2
closed
4 months ago
2
What are the values of "bs" for vision?
#48
codonna9
closed
7 months ago
3
updated to Jan 2024 experiments
#47
Liuhong99
closed
10 months ago
0
I cannot reproduce the small and medium size GPT-2 results.
#46
dahunshin
closed
10 months ago
7
Few-shot evaluation code available?
#45
sanyalsunny111
closed
1 year ago
1
Incorrect "General Usage" code for Updating Hessian and Solution
#44
HFarkhari
closed
1 year ago
1
Unable to reproduce the GPT 2 small results
#43
pmpalang
closed
1 year ago
1
Implementing Sophia-H alternative
#42
thegodone
closed
1 year ago
2
Issue 37 : When running without ddp get_batch didnt work because it requires ddp…
#41
attesaarela
closed
1 year ago
0
Use nn.GELU for GELU. Runs a bit faster
#40
attesaarela
opened
1 year ago
0
Use pytorch2 optimized native attention
#39
attesaarela
opened
1 year ago
0
Sophia-H Implementation in third party
#38
robotzheng
closed
1 year ago
0
NameError: name 'ddp_rank' is not defined
#37
ThuanNaN
closed
1 year ago
1
Which is the original repo?
#36
SagiPolaczek
closed
1 year ago
2
Optimize the gradient step
#35
vmarkovtsev
opened
1 year ago
0
Bug in the per-coordinate clipping?
#34
vmarkovtsev
closed
1 year ago
2
Added basic setup.py to make this package installable with pip
#33
mikljohansson
closed
1 week ago
1
How to run sophia optimizer with huggingface trainer.
#32
Dominic789654
closed
1 year ago
0
Trying to reproduce: AdamW better than SophiaG after tuning
#31
adefazio
closed
1 year ago
11
Can't get good results on smaller models
#30
tsalex1992
closed
1 year ago
1
Please package Sophia as a PyPi Package
#29
guilt
opened
1 year ago
0
which is the original code we should use?
#28
brando90
closed
1 year ago
2
About `bs`, `hessian_update`, and extending to tasks beyond CLM
#27
KeremTurgutlu
closed
1 year ago
1
Sophia with multitensor apply / FusedSophia
#26
skyshine102
closed
1 year ago
1
Sophia on jax
#25
sglucas
closed
1 year ago
4
Training on lit-llama failed to get convergence
#24
hx-Tang
closed
1 year ago
3
Hessian-vector product vs. Hessian estimator
#23
zhouyuan
closed
1 year ago
1
Where is the implementation of the estimators?
#22
logprobability
closed
1 year ago
3
Availability of models?
#21
ArthurConmy
closed
1 year ago
1
Does Sophia support multiple GPU nodes?
#20
skye-glitch
closed
1 year ago
1
Is Sophia-G a second-order optimzier?
#19
Godforever
closed
1 year ago
0
Does not reduced CrammingBERT training time
#18
tbaggu
closed
1 year ago
3
RuntimeError: Passing `optimizers` is not allowed if Fairscale, Deepspeed or PyTorch FSDP is enabled
#17
lw3259111
closed
1 year ago
2
Does this work with 16-mixed precision
#16
tkella47
closed
1 year ago
3
Would you be interested in training a small foundational model with Sophia?
#15
Mythli
closed
1 year ago
3
Update prepare.py
#14
yhgon
opened
1 year ago
0
A new configurator?
#13
arman-hk
opened
1 year ago
0
Training LLMs such as BERT
#12
KongMingxi
closed
1 year ago
3
Minor shape error
#11
anruigu
closed
1 year ago
0
Having trouble replicating the result
#10
nalzok
closed
1 year ago
6
Fix typo in model.py
#9
eltociear
closed
1 year ago
0
Fix "NameError: name 'rho' is not defined"
#8
nalzok
closed
1 year ago
0
Sophia-H Implementation?
#7
nalzok
closed
1 year ago
2
Incomplete WandB logging
#6
nalzok
closed
1 year ago
1
Ambiguous dependency specification
#5
nalzok
closed
1 year ago
1
Does it support BF16?
#4
acostin1
closed
1 year ago
1
Next