Open DevJac opened 5 years ago
While this is true, there's precedent for capitalizing optimizers (see for instance https://arxiv.org/pdf/1705.07774.pdf). They live in a weird space between a proper noun and an initialism.
I agree there is something a little weird here though. Why does ADAGrad capitalize the "ADA" section but not the "Grad" section (same for ADADelta)? If we are using upper camel case, it should be AdaGrad, AdaDelta, AdaM, NAdaM etc.
I think most frameworks capitalise this way, probably due to historical accident (e.g. someone thought ADA was an acronym). I'd be fine with a rename, we'd just need deprecations and such.
The Adam algorithm is called "Adam" (not capitalized) in its paper: https://arxiv.org/pdf/1412.6980.pdf I believe this applies to some of the derivative optimizers like "Nadam" and "AdamW", and probably others.