Closed iblislin closed 6 years ago
Merging #396 into master will increase coverage by
1.68%
. The diff coverage is97.56%
.
@@ Coverage Diff @@
## master #396 +/- ##
==========================================
+ Coverage 70.5% 72.18% +1.68%
==========================================
Files 27 27
Lines 2058 2222 +164
==========================================
+ Hits 1451 1604 +153
- Misses 607 618 +11
Impacted Files | Coverage Δ | |
---|---|---|
src/base.jl | 33.7% <ø> (ø) |
:arrow_up: |
src/kvstore.jl | 75% <0%> (ø) |
:arrow_up: |
src/optimizers/rmsprop.jl | 100% <100%> (ø) |
:arrow_up: |
src/optimizers/adamax.jl | 100% <100%> (ø) |
:arrow_up: |
src/optimizers/adadelta.jl | 100% <100%> (ø) |
:arrow_up: |
src/optimizers/nadam.jl | 100% <100%> (ø) |
:arrow_up: |
src/optimizers/sgd.jl | 100% <100%> (+23.52%) |
:arrow_up: |
src/optimizers/adagrad.jl | 100% <100%> (ø) |
:arrow_up: |
src/optimizers/adam.jl | 100% <100%> (ø) |
:arrow_up: |
src/model.jl | 70.54% <80%> (+0.11%) |
:arrow_up: |
... and 5 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 9d60663...43f1335. Read the comment docs.
Ready for review. I list changes in NEWS.md: https://github.com/dmlc/MXNet.jl/pull/396/files#diff-8312ad0561ef661716b48d09478362f3R263
The motivation of this PR is building more elegant APIs than Python (thanks to good Unicode support of Julia's REPL and editor plugin), and I'm paving the way for porting Python's Gluon.
Thanks for the efforts.
\eta
for learning rate, \mu
for momentum and \lambda
for weight decay. Although it seems to be commonly used, but some papers use different symbols and could be a bit confusing. But those are still OK if you insist.∇c
for gradient clip and ∇r
for gradient rescaling. Those are just super confusing. Because ∇
typically means taking gradient with respect to something or similar things.
- I'm a bit against using
\eta
for learning rate,\mu
for momentum and\lambda
for weight decay. Although it seems to be commonly used, but some papers use different symbols and could be a bit confusing. But those are still OK if you insist.- I'm strictly against using
∇c
for gradient clip and∇r
for gradient rescaling. Those are just super confusing. Because∇
typically means taking gradient with respect to something or similar things.
I agree the your point of view on gradient clipping and rescaling.
I want to hear what naming you want. We can list all of them here, then vote.
Unfortunately, I don't have better naming suggestion apart from the more verbose grad_clip
and grad_scale
.
Add another options:
clip
rescale
or scale
Is it okay to omit the grad_
prefix, given that optimizor is related to gradient stuff?
I did the renaming, please check it out.
good to go?
Sorry for the late reply. I still prefer with grad_
prefix, but I do not have strong objection to the current naming as it is the options for optimizers as you mentioned (and we probably are not going to have scales for momentum and other quantities). Please feel free to merge it.
Thanks a lot.
About the grad_
prefix, there is a way to add aliases in Julia 0.7 (https://github.com/JuliaLang/julia/pull/24960). Once we drop the support of Julia 0.6, I can add the prefix as aliases.
Alias sounds good!
will sort out summary later...