Closed AdamGleave closed 5 years ago
It's possible to write cost functions in such a way that they support both batched and non-batched inputs, but it's error prone. In particular, it's easy to write a function which works for l
, l_x
and l_u
but fails on l_xx
. I tend to assume that if the function evaluates correctly on a few test inputs, then since the derivatives are computed automatically they should be correct as well. So I found this behaviour quite counterintuitive, and it effectively requires testing twice as many cases. Here's a MWE of the problem:
import numpy as np
from theano import tensor as T
from theano.printing import Print
from ilqr.cost import BatchAutoDiffCost
class DemoCost(BatchAutoDiffCost):
def __init__(self, axis):
def f(x, u, i, terminal):
if terminal:
control_cost = T.zeros_like(x[..., 0])
else:
control_cost = T.square(u).sum(axis=axis)
foo = x[..., 0:2]
state_cost = T.sqrt(T.sum(x, axis=axis))
cost = state_cost + control_cost
cost = control_cost
return cost
super().__init__(f, state_size=2, action_size=2)
def similar(x, y):
x = abs(x - y) < 1e-8
if np.isscalar(x):
return x
else:
return x.all()
def test():
correct_cost = DemoCost(axis=-1) # this always works
cost0 = None
try:
# This one is wrong, but can be created in both master and PR.
cost0 = DemoCost(axis=0)
except Exception as e:
print('DemoCost(axis=0) failed with', e)
cost1 = None
try:
# This one is correct. It cannot be created in master. It can in PR.
cost1 = DemoCost(axis=1)
except Exception as e:
print('DemoCost(axis=1) failed with', e)
x = np.array([1, 2])
u = np.array([1, 2])
for i, cost in enumerate([cost0, cost1]):
print('run ', i)
if cost is None:
continue
try: # on PR branch
print('l', similar(correct_cost.l(x, u, 0), cost.l(x, u, 0)))
except Exception as e: # on master branch
print('Exception: ', e)
print('l', similar(correct_cost.l(x, u, [0]), cost.l(x, u, [0])))
print('l_x', similar(correct_cost.l_x(x, u, 0), cost.l_x(x, u, 0)))
print('l_u', similar(correct_cost.l_u(x, u, 0), cost.l_u(x, u, 0)))
print('l_xx', similar(correct_cost.l_xx(x, u, 0), cost.l_xx(x, u, 0)))
print('l_uu', similar(correct_cost.l_uu(x, u, 0), cost.l_uu(x, u, 0)))
if __name__ == '__main__':
test()
On my machine, on master
this produces:
DemoCost(axis=1) failed with Not enough dimensions on Elemwise{sqr,no_inplace}.0 to reduce on axis 1
l True # it appears to work...
l_x True
l_u True
l_xx False # but is actually broken :(
l_uu True
By contrast, in the PR branch:
run 0 # this is wrong, but it's immediately obviously wrong -- it disagrees on l
l False
l_x True
l_u False
l_xx True
l_uu True
run 1 # this is correct
l True
l_x True
l_u True
l_xx True
l_uu True
Sorry, I forgot to address this!
I understand what you mean now. For completeness' sake, I was only able to reproduce your example by removing the cost = control_cost
line.
And I didn't notice there was a discrepancy with the i
vector between l()
and l_*()
. So you can fix that after all. Would you also mind making the same change to BatchAutoDiffDynamics
then so they're both consistent?
You're right the cost = control_cost
line needs to be removed for my example to reproduce; apologies.
I've changed self._i
to be dscalar
in both BatchAutoDiffCost
and BatchAutoDiffDynamics
. I also noticed that BatchAutoDiffDynamics
had a similar issue of inconsistent # of dimensions, which I've resolved analogously to in BatchAutoDiffCost
.
I believe this addresses all outstanding issues in this PR; let me know if there are any other changes you'd like me to make.
First of all, thanks for the great repo -- cleanest implementation of iLQR I've found and easily extensible for my needs.
The current implementation of
BatchAutoDiffCost
passes in the inputs without a batch in the first dimension forl
,l_x
andl_u
(i.e.x.shape == (state_size,)
), and inputs with a batch forl_xx
,l_ux
andl_uu
(i.e.x.shape == (batch_size, state_size)
).Although the cost function
l
can be written to handle both inputs, I find it quite fiddly to get this right, and it means doing things like summing overaxis=-1
sinceaxis=1
doesn't exist in the non-batch case. In this PR I change the input to always have a batch in the first dimension (sometimes of length 1). I also redefinei
to be a scalar, which I think it always will be.