Initial implementation of autograd

pavanky commented 7 years ago

What is done so far:

A proof of concept implementation for automatic differentiation using autograd::Variable, autograd::backward.
This currently implements only a few basic operations (only +, * for now).
Ability to perform first order derivatives. Higher order derivatives to come later.

Variable

This can be constructed in two ways
- Using an af::array from the user
- An operator returning a Variable. The operator constructs the Variable using a set of input Variables, the output array and a grad function.
When var.backward(grad_var) is invoked, it builds a DAG as vector starting with the current variable and propagates gradients down the graph to all the Variables in the graph using the grad function specified at each variable.
Calculating Gradients for a variable (and its subgraph) can be disabled by ivoking var.setCalcGrad(false)

Functions

Each function takes in Variable parameters and return Variable as a parameter.
Each function performs the operation immediately on the data.
Each function returns a Variable constructed using arguments as parameters:
- af::array: The result calculated earlier
- vector<Variable>: containing the inputs to the function
- BackwardFunction_t: A function pointer to the backward pass. Usually implemented as a lambda function.

Example function:

        Variable operator +(const Variable lhs, const Variable rhs)
        {
            auto result = lhs.getData() + rhs.getData();
            auto backward = [](std::vector<Variable> inputs, Variable grad_output) {
                inputs[0].addGrad(grad_output);
                inputs[1].addGrad(grad_output);
            };
            return Variable(result, {lhs, rhs}, backward);
        }

Example:

A simple example showcasing how this can be done currently

void test()
{
    using af::autograd::Variable;
    auto x = Variable(af::randu(5), true);
    af_print(x.array());
    auto y = Variable(af::randu(5), true);
    af_print(y.array());
    auto z = x * x + x * y + y * y;
    auto dz = Variable(af::constant(1.0, 5), false);
    z.backward(dz);
    auto dx = x.grad();
    auto dy = y.grad();
    af_print(dx.array() - 2 * x.array() - y.array());
    af_print(dy.array() - 2 * y.array() - x.array());
}

TODO: for this PR

[x] Add all math operations: +, -, *, /, sin, cos, exp, tanh
[x] Add array operations: tile, sum, transpose
[x] Add operations required for Dense layers: matmul
[x] Reimplement existing layers using autograd
[x] Option to enable or disable building sub graphs
[x] Option to enable or disable retaining graphs for gradients
[x] Make sure perceptron example is working.
[x] Add train and evaluation mode for modules
Fixes #2

pavanky commented 7 years ago

@botev @jramapuram @itsnarsi This has been a long time coming, but I'd appreciate if you guys had any feedback as well.

pavanky commented 7 years ago

CC @arrayfire/core-devel

pavanky commented 7 years ago

@Reithan too

jramapuram commented 7 years ago

Awesome work @pavanky . Will take a look in more detail when I get to a terminal. Quick question: can you take second derivatives with your implementation?

pavanky commented 7 years ago

@jramapuram Not yet, I wanted to get the first order working first :)

pavanky commented 7 years ago

@jramapuram went ahead and changed the gradients to be Variables too. This should make it easy to perform higher order derivatives.

itsnarsi commented 7 years ago

@pavanky just tested it on my laptop and it looks pretty neat. Unlike python, I did not see any initial delay. This might be because of no JIT I guess. When will this be merged to this repo?

pavanky commented 7 years ago

@itsnarsi This is still very nascent. I want to incorporate some of the stuff mentioned here to make it more efficient: http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs

pavanky commented 7 years ago

Decreased the scope of the PR to get a minimum viable thing going. The additional functions and operators can be added once this PR gets merged.

pavanky commented 7 years ago

@jramapuram I think enabling the support for higher order derivatives by default will increase the memory being used. I am going to enable a flag to enable it during the backward pass. By default only the values will be stored.

arrayfire / arrayfire-ml

Initial implementation of autograd #30

[x] Add train and evaluation mode for modules