FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor
https://fluxml.ai/
Other
4.44k stars 601 forks source link

Change abstract argument names to meaningful ASCII #915

Open janEbert opened 4 years ago

janEbert commented 4 years ago

Discussed in #853 (starting here). I suggest naming the currently abstract, mathematical symbols used in argument names something more meaningful. Most other deep learning frameworks like TensorFlow, PyTorch and MXNet lead in example (although they are not entirely consistent either). While many parameters can be explained pretty concisely, this may be hard for some. betas for Adam-like optimizers was kept as a name in all of the frameworks I listed. We could go in the direction of usability and leave these "standard" names as is.

Examples (current -> suggestion)

For layers:

For optimizers:

In the interest of formula-lovers, we could reference an argument's usual mathematical symbol in the docstring. For example:

# Arguments
- `learning_rate`: [...] ``α`` or ``η`` in the literature.
MikeInnes commented 4 years ago

Focusing on the optimisers for the moment, the one other symbol I see is gamma. But I think that covers everything.

For eta, I suggest rate. For ADAM, beta seems fine; unlike the others I don't see something like momentum_decays adding clarity here, this really just needs a description.

With ASCII names we can also consider making these things keyword arguments. I think rate should still be the (first, only) positional argument since it's so universal, with keywords for everything else.

janEbert commented 4 years ago

I think rate is a bit too ambiguous. What do you think about step_size? And I agree with betas.

With ASCII names we can also consider making these things keyword arguments. I think rate should still be the (first, only) positional argument since it's so universal, with keywords for everything else.

I agree with this. I guess then it would make sense to summarize all Descent-related optimizers as well. Something like Descent(rate; momentum=0.0, nesterov=false) (also borrowed from TF).

janEbert commented 4 years ago

Since this never got any traction, should I simply close it?

DhairyaLGandhi commented 4 years ago

I think it's a matter of getting agreement on the names, but I think something of this nature is valuable for when a font doesn't support some character or messes with the printing. Fwiw I second rate too. It's a pretty well understood term in the context of optimisers.

janEbert commented 4 years ago

I guess then we are only missing a name for gamma. :) How about decay, decay_rate or decay_factor?