Closed uschwes closed 8 years ago
@HannesSommer @pfmark
Okay, let's finish this thing :).
x->1/(1 + exp(x))
. After we have that function properly implemented together with its Jacobian, it is easy to get the logistic function right, because that only involves chaining with affine linear transformations. To do that with minimal hassle I would keep all the interface we have and just add two functions, oneOverOnePlusExp and oneOverOnePlusExpJacobian and test the separately till they are as good as we need them to be and use them later in the implementation of the ScalarExpressionNodeInverseSigmoid. This will solve all the threshold extra complication we have when considering the additional parameters of the Logistic function.Very good ideas. tanh looks suitable indeed. atan too. We should try both. I agree that solving the "flipped" sigmoid problem is probably a waste of time. I'll give it a try.
Maybe I wasn't clear about tanh
, is is really basically the same function. It is only a affine linear transformation of the original logistic function. So it won't help. Of course it is still nice to have it in the list of AD supported functions. But for the planner it won't help. tanh
I only mentioned as an idea to look for in case somebody solved the numerical problems for tanh already. Because then one could use these solutions for our logistic functions.
atan
is a completely different story and might very well be nicer numerically and maybe even better suited.
Ah, wait, there is a std::tanh
implementation already? Then that could be a good implementation already. Then it could actually help. Have you seen the relation to the logistic function? (s. https://en.wikipedia.org/wiki/Logistic_function). You can use it as an drop in replacement because 1 / (1 + exp(-x)) = 1/2 ( 1 + tanh( x / 2 ) )
.
That makes sense. Have to try. The atan version works okay for planning, but I'm not sure we want the increased slope for larger x. The slope is not enough to push something out of obstacles, but it makes avoidance distance quite sensitive to the parameters of the atan function. Might make more sense to have a steep slope and then switch to linear.
But far away the slope is basically the same ( difference stays smaller one for the unscaled x
!). Only shortly after the center point the beginning the atan
is a bit steeper. Isn't that what we want?
See http://www.wolframalpha.com/input/?i=atan%28x%29+-+tanh%28x%29 for atan - tanh
and
http://www.wolframalpha.com/input/?i=1%2F%281%2B+x^2%29+-+1%2Fcosh^2%28x%29 for their derivatives (atan' - tanh'
).
What I meant is that for a reasonable gradient inside obstacles the shift parameter must not be too large. That implies that the function has a significant impact on the cost function also outside (but close too) obstacles. We can hardly change the "influence region", since modifying the scale parameter affects the ability to push something out of obstacles. Of course this is the same for the sigmoid, just an argument to look for something non-symmetric.
For testing one could introduce a pre sacle factor for the negative part of the signed distance field. If that is greater one it would get pushed harder out of the obstacles, without changing the slope at non obstacles.
Oh interesting, that's a cool idea. You mean downscale the negative distances so they get closer to the "step".
That's a bit difficult to implement in the distance field, since we use the shift
parameter to incorporate the agent radii. So for an agent with center point outside the obstacle but boundary inside, the distance from the grid is positive.
Ah, good point. I forgot about that. It is anyway equivalent to apply a piecewise factor (1 for > 0 and the other larger for < 0 ) to the distance. I was thinking of the distance map mostly for performance. But I also forgot that we need continuous derivatives, don't we? Maybe one should in fact apply a differentiable function with similar graph. Or just go for that linear, differentiable continuation of the positive half of the sigmoid to the negative as you suggested.
Or we use a mutiplicative function consisting of three polynomials: linear if smaller than - eps, cubic if - eps < x < 0 to be able to match derivatives of 1 and zero on both sides and 1 if larger than 0. Basically same as hermite spline interpolation to get the desired function of 1 - x if x negative and 1 otherwise.
Seems to be reasonable to continue any function having the desired properties on the positive abscissa like a steep slope at zero and zero as x - > inf with a quadratic function on the negative abscissa. This way, optimizers which use a quadratic approximation locally should get out of obstacles very quickly. Yet I observed very large number of iterations for the current BFGS implementation. The reason AFAIK is the following: in the first step of BFGS, the approximate Hessian is identity, thus the first step is steepest descent (with linesearch). Since we have very a large gradient inside obstacles, the linesearch returns a very small step size, which is OK. The update of the approximate Hessian then should actually return the true Hessian for a quadratic function (todo: check that this is true). Now if we'd use a step size of one and we have a pure quadratic function, the next step should bring us to the minimum. Yet in our implementation we use the step size of the first iteration to seed the linesearch in the second iteration that is very small, so we make very small progress in the following iterations. That seems wrong and raises the question how to set the initial step length.
@helenol hey Helen, do you have something in your framework that can reliably push spline trajectories out of obstacles during gradient based optimization in case the initial guess is in collision? We struggle to find something that works if the trajectory is really going completely through a bigger obstacle.
Reliably is a strong word... It's kind of initialization dependent (if it's 0 gradient at initialization, it really struggles obviously, but otherwise works), but for the most part, yes. The discrete-time approach is CHOMP: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1040&context=robotics Then my paper is the continuous-time version of that based on high-degree polynomial splines: http://helenol.github.io/publications/iros_2016_replanning.pdf
The code is a bit rough at the moment, but it's based on our polynomial trajectory framework: https://github.com/ethz-asl/mav_planning_utils
Let me know if you want more details/hook you up with the code.
This was fixed, wasn't it, @uschwes ?
At least the numerical issues, yes.
Maybe we should close it here and move whatever is unsolved to new issues wherever appropriate?
I think the days of the MaxEnt framework are counted, let's just close
There are currently three issues with the ScalarExpressionNodeInverseSigmoid functionality:
IMHO we should compute the error in the gradient method and check for overflow. If it overflows, we return zero. In addition we have to compute a suitable threshold for taylor series expansion based on the scale, shift and height values and use this threshold in both error and gradient computation. This is unfortunately not trivial as we need a function to compute the maximum x value dependent on height, scale and shift at which the gradient computations will not overflow in advance.
I believe these problems make the BFGS optimizer fail from time to time.