anassinator / ilqr

Iterative Linear Quadratic Regulator with auto-differentiatiable dynamics models
GNU General Public License v3.0
384 stars 80 forks source link

BatchAutoDiffCost: make shape consistent #7

Closed AdamGleave closed 5 years ago

AdamGleave commented 5 years ago

First of all, thanks for the great repo -- cleanest implementation of iLQR I've found and easily extensible for my needs.

The current implementation of BatchAutoDiffCost passes in the inputs without a batch in the first dimension for l, l_x and l_u (i.e. x.shape == (state_size,)), and inputs with a batch for l_xx, l_ux and l_uu (i.e. x.shape == (batch_size, state_size)).

Although the cost function l can be written to handle both inputs, I find it quite fiddly to get this right, and it means doing things like summing over axis=-1 since axis=1 doesn't exist in the non-batch case. In this PR I change the input to always have a batch in the first dimension (sometimes of length 1). I also redefine i to be a scalar, which I think it always will be.

AdamGleave commented 5 years ago

It's possible to write cost functions in such a way that they support both batched and non-batched inputs, but it's error prone. In particular, it's easy to write a function which works for l, l_x and l_u but fails on l_xx. I tend to assume that if the function evaluates correctly on a few test inputs, then since the derivatives are computed automatically they should be correct as well. So I found this behaviour quite counterintuitive, and it effectively requires testing twice as many cases. Here's a MWE of the problem:

import numpy as np
from theano import tensor as T
from theano.printing import Print

from ilqr.cost import BatchAutoDiffCost

class DemoCost(BatchAutoDiffCost):
    def __init__(self, axis):
        def f(x, u, i, terminal):
            if terminal:
                control_cost = T.zeros_like(x[..., 0])
            else:
                control_cost = T.square(u).sum(axis=axis)

            foo = x[..., 0:2]
            state_cost = T.sqrt(T.sum(x, axis=axis))
            cost = state_cost + control_cost
            cost = control_cost
            return cost

        super().__init__(f, state_size=2, action_size=2)

def similar(x, y):
    x = abs(x - y) < 1e-8
    if np.isscalar(x):
        return x
    else:
        return x.all()

def test():
    correct_cost = DemoCost(axis=-1)  # this always works
    cost0 = None
    try:
        # This one is wrong, but can be created in both master and PR.
        cost0 = DemoCost(axis=0)
    except Exception as e:
        print('DemoCost(axis=0) failed with', e)
    cost1 = None
    try:
        # This one is correct. It cannot be created in master. It can in PR.
        cost1 = DemoCost(axis=1)
    except Exception as e:
        print('DemoCost(axis=1) failed with', e)

    x = np.array([1, 2])
    u = np.array([1, 2])

    for i, cost in enumerate([cost0, cost1]):
        print('run ', i)
        if cost is None:
            continue
        try:  # on PR branch
            print('l', similar(correct_cost.l(x, u, 0), cost.l(x, u, 0)))
        except Exception as e:  # on master branch
            print('Exception: ', e)
            print('l', similar(correct_cost.l(x, u, [0]), cost.l(x, u, [0])))
        print('l_x', similar(correct_cost.l_x(x, u, 0), cost.l_x(x, u, 0)))
        print('l_u', similar(correct_cost.l_u(x, u, 0), cost.l_u(x, u, 0)))
        print('l_xx', similar(correct_cost.l_xx(x, u, 0), cost.l_xx(x, u, 0)))
        print('l_uu', similar(correct_cost.l_uu(x, u, 0), cost.l_uu(x, u, 0)))

if __name__ == '__main__':
    test()

On my machine, on master this produces:

DemoCost(axis=1) failed with Not enough dimensions on Elemwise{sqr,no_inplace}.0 to reduce on axis 1
l True  # it appears to work...
l_x True                                                                                                                                                                                              
l_u True                                                                                                                                                                                              
l_xx False  # but is actually broken :(
l_uu True

By contrast, in the PR branch:

run  0  # this is wrong, but it's immediately obviously wrong -- it disagrees on l
l False                                                                                                                                                                                               
l_x True
l_u False                                                                                                                                                                                             
l_xx True                                                                                                                                                                                             
l_uu True                                                                                                                                                                                             
run  1  # this is correct
l True                                                                                                                                                                                                
l_x True                                                                                                                                                                                              
l_u True                                                                                                                                                                                              
l_xx True                                                                                                                                                                                             
l_uu True     
anassinator commented 5 years ago

Sorry, I forgot to address this!

I understand what you mean now. For completeness' sake, I was only able to reproduce your example by removing the cost = control_cost line.

And I didn't notice there was a discrepancy with the i vector between l() and l_*(). So you can fix that after all. Would you also mind making the same change to BatchAutoDiffDynamics then so they're both consistent?

AdamGleave commented 5 years ago

You're right the cost = control_cost line needs to be removed for my example to reproduce; apologies.

I've changed self._i to be dscalar in both BatchAutoDiffCost and BatchAutoDiffDynamics. I also noticed that BatchAutoDiffDynamics had a similar issue of inconsistent # of dimensions, which I've resolved analogously to in BatchAutoDiffCost.

I believe this addresses all outstanding issues in this PR; let me know if there are any other changes you'd like me to make.