FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor
https://fluxml.ai/
Other
4.47k stars 603 forks source link

"ADAM" and friends should be called "Adam" #795

Open DevJac opened 5 years ago

DevJac commented 5 years ago

The Adam algorithm is called "Adam" (not capitalized) in its paper: https://arxiv.org/pdf/1412.6980.pdf I believe this applies to some of the derivative optimizers like "Nadam" and "AdamW", and probably others.

ornithos commented 5 years ago

While this is true, there's precedent for capitalizing optimizers (see for instance https://arxiv.org/pdf/1705.07774.pdf). They live in a weird space between a proper noun and an initialism.

ornithos commented 5 years ago

I agree there is something a little weird here though. Why does ADAGrad capitalize the "ADA" section but not the "Grad" section (same for ADADelta)? If we are using upper camel case, it should be AdaGrad, AdaDelta, AdaM, NAdaM etc.

MikeInnes commented 5 years ago

I think most frameworks capitalise this way, probably due to historical accident (e.g. someone thought ADA was an acronym). I'd be fine with a rename, we'd just need deprecations and such.