issues
search
XuezheMax
/
apollo
Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
Apache License 2.0
180
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
CUDA out of memory when training LSTM with AdaHessian
#11
yzlnew
closed
2 years ago
1
Any changes possible to reduce GPU memory usage?
#10
sjscotti
opened
2 years ago
5
Using Approximate Hessian to steer training towards wide flat minima
#9
evanatyourservice
closed
2 years ago
2
Different Sigma for 'belief' and 'constant' versions
#8
evanatyourservice
closed
3 years ago
4
Question about the convergence order
#7
cam1681
closed
3 years ago
1
hi, i use apollo to train resnet18 at imagenet-1k datasets, using the parameters in README, but best acc is only 69.9% while the baseline acc is 70.5%. Is it the params setting problem or others?
#6
JLtwoP
closed
3 years ago
10
Apollo applied to NMT
#4
BUAAers
closed
3 years ago
3
Any expectation on noisy data?
#3
soloice
closed
4 years ago
6
Why the name Apollo?
#2
soloice
closed
4 years ago
2
Slow convergence rate and lower accuracy than Adam?
#1
githubharald
closed
4 years ago
6