arrayfire / arrayfire-ml

ArrayFire's Machine Learning Library.
BSD 3-Clause "New" or "Revised" License
102 stars 23 forks source link

Initial implementation of autograd #30

Closed pavanky closed 7 years ago

pavanky commented 7 years ago

What is done so far:

Variable

Functions

Example function:

        Variable operator +(const Variable lhs, const Variable rhs)
        {
            auto result = lhs.getData() + rhs.getData();
            auto backward = [](std::vector<Variable> inputs, Variable grad_output) {
                inputs[0].addGrad(grad_output);
                inputs[1].addGrad(grad_output);
            };
            return Variable(result, {lhs, rhs}, backward);
        }

Example:

A simple example showcasing how this can be done currently

void test()
{
    using af::autograd::Variable;
    auto x = Variable(af::randu(5), true);
    af_print(x.array());
    auto y = Variable(af::randu(5), true);
    af_print(y.array());
    auto z = x * x + x * y + y * y;
    auto dz = Variable(af::constant(1.0, 5), false);
    z.backward(dz);
    auto dx = x.grad();
    auto dy = y.grad();
    af_print(dx.array() - 2 * x.array() - y.array());
    af_print(dy.array() - 2 * y.array() - x.array());
}

TODO: for this PR

pavanky commented 7 years ago

@botev @jramapuram @itsnarsi This has been a long time coming, but I'd appreciate if you guys had any feedback as well.

pavanky commented 7 years ago

CC @arrayfire/core-devel

pavanky commented 7 years ago

@Reithan too

jramapuram commented 7 years ago

Awesome work @pavanky . Will take a look in more detail when I get to a terminal. Quick question: can you take second derivatives with your implementation?

pavanky commented 7 years ago

@jramapuram Not yet, I wanted to get the first order working first :)

pavanky commented 7 years ago

@jramapuram went ahead and changed the gradients to be Variables too. This should make it easy to perform higher order derivatives.

itsnarsi commented 7 years ago

@pavanky just tested it on my laptop and it looks pretty neat. Unlike python, I did not see any initial delay. This might be because of no JIT I guess. When will this be merged to this repo?

pavanky commented 7 years ago

@itsnarsi This is still very nascent. I want to incorporate some of the stuff mentioned here to make it more efficient: http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs

pavanky commented 7 years ago

Decreased the scope of the PR to get a minimum viable thing going. The additional functions and operators can be added once this PR gets merged.

pavanky commented 7 years ago

@jramapuram I think enabling the support for higher order derivatives by default will increase the memory being used. I am going to enable a flag to enable it during the backward pass. By default only the values will be stored.