dmlc / MXNet.jl

MXNet Julia Package - flexible and efficient deep learning in Julia
371 stars 70 forks source link

Optimizer module overhaul #396

Closed iblislin closed 6 years ago

iblislin commented 6 years ago

will sort out summary later...

codecov-io commented 6 years ago

Codecov Report

Merging #396 into master will increase coverage by 1.68%. The diff coverage is 97.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #396      +/-   ##
==========================================
+ Coverage    70.5%   72.18%   +1.68%     
==========================================
  Files          27       27              
  Lines        2058     2222     +164     
==========================================
+ Hits         1451     1604     +153     
- Misses        607      618      +11
Impacted Files Coverage Δ
src/base.jl 33.7% <ø> (ø) :arrow_up:
src/kvstore.jl 75% <0%> (ø) :arrow_up:
src/optimizers/rmsprop.jl 100% <100%> (ø) :arrow_up:
src/optimizers/adamax.jl 100% <100%> (ø) :arrow_up:
src/optimizers/adadelta.jl 100% <100%> (ø) :arrow_up:
src/optimizers/nadam.jl 100% <100%> (ø) :arrow_up:
src/optimizers/sgd.jl 100% <100%> (+23.52%) :arrow_up:
src/optimizers/adagrad.jl 100% <100%> (ø) :arrow_up:
src/optimizers/adam.jl 100% <100%> (ø) :arrow_up:
src/model.jl 70.54% <80%> (+0.11%) :arrow_up:
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9d60663...43f1335. Read the comment docs.

iblislin commented 6 years ago

Ready for review. I list changes in NEWS.md: https://github.com/dmlc/MXNet.jl/pull/396/files#diff-8312ad0561ef661716b48d09478362f3R263

The motivation of this PR is building more elegant APIs than Python (thanks to good Unicode support of Julia's REPL and editor plugin), and I'm paving the way for porting Python's Gluon.

pluskid commented 6 years ago

Thanks for the efforts.

iblislin commented 6 years ago
  • I'm a bit against using \eta for learning rate, \mu for momentum and \lambda for weight decay. Although it seems to be commonly used, but some papers use different symbols and could be a bit confusing. But those are still OK if you insist.
  • I'm strictly against using ∇c for gradient clip and ∇r for gradient rescaling. Those are just super confusing. Because typically means taking gradient with respect to something or similar things.

I agree the your point of view on gradient clipping and rescaling.

I want to hear what naming you want. We can list all of them here, then vote.

pluskid commented 6 years ago

Unfortunately, I don't have better naming suggestion apart from the more verbose grad_clip and grad_scale.

iblislin commented 6 years ago

Add another options:

iblislin commented 6 years ago

Is it okay to omit the grad_ prefix, given that optimizor is related to gradient stuff?

iblislin commented 6 years ago

I did the renaming, please check it out.

iblislin commented 6 years ago

good to go?

pluskid commented 6 years ago

Sorry for the late reply. I still prefer with grad_ prefix, but I do not have strong objection to the current naming as it is the options for optimizers as you mentioned (and we probably are not going to have scales for momentum and other quantities). Please feel free to merge it.

iblislin commented 6 years ago

Thanks a lot.

About the grad_ prefix, there is a way to add aliases in Julia 0.7 (https://github.com/JuliaLang/julia/pull/24960). Once we drop the support of Julia 0.6, I can add the prefix as aliases.

pluskid commented 6 years ago

Alias sounds good!