adding gamma term on noise smoothing to base implementation (ready for review)

artofnothingness / mppic

MIT License

84 stars 19 forks source link

adding gamma term on noise smoothing to base implementation (ready for review) #99

Closed SteveMacenski closed 1 year ago

SteveMacenski commented 1 year ago

@tkkim-robot is this formulation correct?

SteveMacenski commented 1 year ago

@artofnothingness can you test this and sanity check that you see the same improvements (or at least non-regressions) I do?

You should see basically the same behavior, but moves 6% faster while doing so. Its potentially a hair more smooth as well, but I didn't try to quantify it.

tkkim-robot commented 1 year ago

@SteveMacenski This is basically correct. But, 'gamma/std' should be 'gamma/(std^2)'. $\Sigma$ in the paper means variance, so the denominator should be square of std $(\sigma^2)$.

artofnothingness commented 1 year ago

@SteveMacenski i'll take a look on this and test after reading the paper

SteveMacenski commented 1 year ago

OK - adding the squared version, testing now the parameters to make sure they still are the right answer with that change.

SteveMacenski commented 1 year ago

Adding the squared version significantly damages performance, we no longer hit 0.4m/s (of the 0.5m/s desired speed). Instead, we're at 0.2m/s. I changed gamma to 1.0 and robot didn't move (honestly not shocking though, sounds about right). Down to 0.0 back to total (since has no impact).

I'm seeing something like 0.01 - 0.02 having decent similar performance, which is much lower than what you've mentioned from your experience and what other papers mention $\gamma$ being. Is this perhaps because its washing out or other cost functions? Both "Information Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving" and your SMPPI paper are comparing on a racetrack experiment with similar cost functions, critic function orders as quadratic, and weights. Or have you seen 0.1 holding well with other experiments as well? Your weights for the slip / track functions are pretty high (10,000).

I found if I square all of the other critic functions though, 0.1 works OK. Right now, all of our critic functions are linear, not quadratic penalties. If we made them all squared, we've have to retune / balance them w.r.t. each other (the behavior I saw was "fine" to prove the point this new term works, but it did some awkward stuff which is at fault of the new critic gains). But I suspect that's why squaring the STD causes the issue with needing a much reduced $\gamma$.

Is there something in the theoretical foundations that requires us to use $\Sigma$ as squared if our penalty functions are linear? If not, we could make this use the same cost_power parameterization as the other critics so that its in the same order of function as the other critics for balancing these terms. That would be preferable to having to retune every penalty, which would be time consuming - if possible. Using the non-squared STD and gamma as 0.1 seems to work fine in practice... but might not be good form.

tkkim-robot commented 1 year ago

That is a valuable observation. First of all, the std in the control cost should be squared in theory. The control cost is theoretically derived from the free energy and importance sampling. The origin of this is from probability density functions (pdf) for q, which is the controlled distribution of MPPI: The above equation is from the IT-MPC paper, and this pdf was defined by the multivariate normal distribution. So, basically, the $\Sigma$ is the square of std.

I also don't see any relation between the form of the cost function and the std stuff. MPPI allows any type of cost function (which is denoted by $S(V;x_0)$ ), so I think this is not the case. The 10,000 weight in the cost function is a hard cost, which penalizes lethal behaviors, such as collisions. So the rest of the trajectories are not getting this large penalty.

SteveMacenski commented 1 year ago

OK. Squaring an inverse of a small number makes a very small denominator which blows out the other critics as currently tuned - ex 0.2 std (1/0.2 = 5) is 0.04 (1/0.04 = 25) variance.

Our options are to retune every critic to use this value with a target gamma of 0.1, or we can drop gamma to something very low (0.01-0.02). The critics have their cost functions squared whereas this one virtually just has a squared weight gamma / std^2 applied. So maybe just having a super low gamma would be ok? I'm seeing something in the ballpark of 0.01 - 0.02 being pretty good w/ temperature at 0.15 (if dropping temperature more, increase gamma - there's some tuning to be done to balance these 2 for the best results in speed / smoothness). I suppose though it has values squared with v_t * e_t. Might be worth me retuning just to see.

But reading "Information Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving", the term is

Screenshot from 2022-10-10 17-24-13

where $v_t ~ N(u_t, \Sigma)$ (pg 2), so shouldn't this actually be

gamma / sampling_std.vx^2 * control_sequence_.vx * state_.cvx

Not using the bounded noises (e.g. state_.cvx - control_sequence_.vx)? Your paper says the bounded cost (when using action stuff), but for this PR given its just adding it to the raw MPPI implementation, should it be state_.cvx instead? I tested that out, and that seems to work well to me (but don't see much difference using the bounded noises either). Admittedly, looking over Section 2 in your paper, I'm not quite seeing the proof why it gets reduced to $u_t - 2 \epsilon_t$, then later simply $\epsilon_t$ in your eq. 1 -> 9 -> 10.

Edit: Though in this paper, I see the $\epsilon$ version you have, but with Lambda instead of Gamma, so I assume that means this pre-dates that work and might be out of date of current thinking https://faculty.cc.gatech.edu/~bboots3/files/InformationTheoreticMPC.pdf?

Screenshot from 2022-10-10 18-30-07

... which is really annoying. I'm not sure which, either, or neither is right :facepalm:

tkkim-robot commented 1 year ago

... which is really annoying. I'm not sure which, either, or neither is right facepalm

This is one of the most confusing part of MPPI. Since they have 3 ICRA papers and 2 journal papers (T-RO, and Journal of Guidance, Control, and Dynamics) all discussing about the same topic. So I settled down on the T-RO version, which is the last paper describing MPPI. (Title: Information-theoretic model predictive control: Theory and applications to autonomous driving)

Decoupling lambda and gamma was introduced in the T-RO 2018 paper, and the one using lambda in the control cost is from the paper on ICRA 2017. So using gamma is one of the latest appending from GaTech. But I found this is a simple yet good technique to improve the performance. So I am using this around during my research.

I'm not quite seeing the proof why it gets reduced to , then later simply in your eq. 1 -> 9 -> 10.

This is simple. In SMPPI paper:

$2\epsilon$ in Eq.7 is multiplied with $1/2$ in front of lambda, so it becomes $\epsilon$ in Eq.10.
As you might notice, there is no $()^k$ on the $u$. So, every $u^{\top}\Sigma^{-1}u$ in the parallel samples are having the exact same value. The only difference here is $u^{\top}\Sigma^{-1}\epsilon^{k}$. Therefore, after subtracting the minimum state cost $\beta$ for all cost, this term ( $u^{\top}\Sigma^{-1}u$ ) can be ignored. This is the one the authors of MPPI did not explain clearly in the original paper. They just suddenly extract $u-v$ in the control cost (which is $\epsilon$ ), and this was not addressed before. So this might be a confusing part.

should it be state_.cvx instead? I tested that out, and that seems to work well to me (but don't see much difference using the bounded noises either)

So, this is a correct observation. There should be no difference. They are exactly the same thing numerically. The reason why I used $\epsilon$ in my paper is to follow the latest convention of the T-RO paper.

So maybe just having a super low gamma would be ok?

Yes, this is OK. gamma is just a empirically tuned parameter. If you choose to use gamma=0.01~0.02 here, I will also check whether it is also valid for SMPPI and let you know again.

SteveMacenski commented 1 year ago

So, this is a correct observation. There should be no difference. They are exactly the same thing numerically. The reason why I used in my paper is to follow the latest convention of the T-RO paper.

Got it, that's sensible. You're right that it would not impact after $\beta$ is subtracted, but it does impact to relative costs of the control cost vs the state (e.g. critic) costs for balancing, which in turn actually impacts then the tuning of $\gamma$. But with that excellent explanation, I can confidently move in a direction.

My tentative plan tomorrow, if I get enough time, is to retune for squared critic functions to be able to compare against the best performance I can get with the current setup. I will, as part of that, play with values of $\gamma$ and cvx vs $\epsilon$. I agree that the $\epsilon$ representation over cvx is a "better" way to go, as I think it signals the intent more clearly.

Though, to be principled, I will probably need to start with a more basic setup and create some performance metrics to tell if / when the control cost term squared with large error washes out the linear / quadratic critic cost functions with large errors. This is mostly what will take a long time that I can't probably do pre-IROS, but I'm hoping that there are clearly defined performance differences between linear vs quadratic so I can be a little more hand wavey. If there's not, then I'll need to make something quantitative to make a decision based off of.

What I mean by that is that since this term added in this PR is $v * u$, if the current vs are larger than the us creating a large error, how does that balance with the critic functions having a linear weighed error function from desired vs quadratic weighted error function (since all of the critics are really just error functions where 0 cost only occurs at the optimal state for each). I'm wanting to know if during large / small errors we end up with smooth / unsmoothed behavior when balanced with different orders of critic functions so that over the range of possible values we have consistent performance.

If, for instance, $\gamma$ is small but the weights on the critic functions were very large, this term wouldn't help smooth things (and vise versa). That's kind of where I feel like this term in the integrated trajectory cost is a principled but not complete. It needs to have some relative weighting applied to it consistently relative to state costs so that it can be consistently applied. I feel like MPPI should have this term not simply added to the state costs, but a multiplication of it, or something normalizing the terms consistently relative to each other (e.g. $\omega{control} * \gamma u \Sigma v + \omega{state} * critics$ or probably more realistically $\gamma u \Sigma v + \omega_{state} ||S(V)||$, where I think the critics stuff is $S(V)$ in the text) so that no matter the absolute value of the critic function summed costs, there's a weighting of the smoothness/informatics to the arbitrarily defined cost functions. In MPPI there's nothing stopping you from having your gamma be 0.1 but then the weights on the critics be 1e20 and then virtually this term is ineffectual. That shouldn't be the case, in my opinion. This should be applied consistently to all the trajectories irrespective of the proportional weighting. That's assuming my interpretation of the gamma term is accurate, which I think it is from the T-RO text: A value in-between zero and one balances the two requirements of low-energy and smoothness. We should have basically an "energy proportionality of S(V)" term. Which would make this whole thing much easier, since I can tune the critics relative to each other, and then tune the critics to the smoothness function, rather than having to retune everything in bulk.

Separate of that, I did some thinking today, I think the reason why our lambda / gamma are so much smaller than you're used to is because our cost functions are linear and not quadratic. Since they're linear, the costs are more closely bunched together so we need to take a smaller band of them to find the valid trajectory. If they were convex, the differences between the best and worst cases would be more defined and we could widen the set of controls to impact our final results a bit. Also, I think its because we have more and complex cost functions going on, so small differences in top-performing trajectories are more important. But if we make the cost functions convex, I think we should be re-entering the range of values you expect.

Thanks for the information, as always. Let me know if that sparks any thoughts!

tkkim-robot commented 1 year ago

That's an insightful opinion. I agree there is no generally acceptable golden parameters for lambda and gamma. They are 100 % adjustable. Plus, I agree that the difference may come from the linear vs quadratic critic function.

In my case, I tune the weight of each critic function in S(V), so that \gamma ControlCost and ||S(V)|| are in a similar magnitude. It is important to point out here that the impulse-like conditional penalty critic (e.g., 10,000 impulse cost for collision) should be, and only should be, immensely larger than \gamma ControlCost and the other part in ||S(V)||. The word of "10,000" does not necessarily mean the penalty weight should be 10,000, it can be in other forms as well. In the case of getting impulse penalty, we do not have to consider about the control cost and the other part; we just need to avoid it as mush as possible. And then finally, I start to tune lambda and finely tune other parameters.

SteveMacenski commented 1 year ago

I tune the weight of each critic function in S(V), so that \gamma ControlCost and ||S(V)|| are in a similar magnitude

Is this perhaps something that can become more formalized in the MPPI method (could be one contribution in a paper)? Its a small contribution considering S(V) is arbitrary, so defining a $\omega_{cost} S(V)$ is really still the same as $S(V)$, but more practically helpful when not working with what amounts to a single critic function. But maybe there's a way to derive that more formally.

Understood on the impulse penalties, that's just like our collision cost penalty. In practice, it looks like the MPPI papers really only have a single consistent critic function penalizing for full speed, whereas we have many. For less trivial applications / constraints, it seems like having some formalization of the weighting between the low-energy/smoothness balance and the state cost balance would be beneficial. We can "just do it" here as part of our $S(V)$ definition, but I think that's a useful thing to describe to others if we can make more mathematical basis for it.

Its sounding more and more to me that a paper in effect of Application of MPPI with Multi-Objective State Costs (Of Mobile Robots or For Path Tracking or some specific task?) would be valuable describing:

S(V) weighting for smoothness/low-energy and the numerous state costs / critics
Motivated by the need to have multiple critic functions for our application, and describing their design (not trivial "go full speed" for rally racing)
Explaining the requirements/needs for a mobile robotics system to use this in practical tasks (the application - service, industrial, home, or consumer robots in dynamic/confined/human-filled spaces, etc - pick a subset to build from)
benchmarking/describing behaviors the controller has out (those 3-point turns, dynamic environments, etc) meeting those requirements/needs. Compare against the trivial velocity critic previously used in MPPI and show behaviors it allows for / better navigation success rate / etc.
Newer variants required for use (SMPPI stuff, 1% sampling, etc)

That doesn't feel like "alot", since its all kind of engineering design rather than adding something new to the MPPI theory. This is kind of why it feels like a paper would need something mathematical contribution to add to the MPPI method motivated by our application, and then the rest of this is added on as experimental description / design to fulfill the application that motivated the MPPI variation. So as we're chatting in these tickets / email, let me know if you think there's some method variation that comes to mind that can help for our practical mobile robotics application - that could unlock the rest of this for an IROS 2023 paper.

Perhaps some variation to help with unsmooth/jittery S(V) critic functions like we have here causing wobbly behavior :wink: Or something else, of course

Edit: See other ticket for some discussion on today's progress -- these PRs are getting warped together now. Right now, I'm working on with PR's code as the paired down version of just the energy/smoothness term (which we'll apply over there eventually).

tkkim-robot commented 1 year ago

I temporarily closed #98, so that we could resolve the issues from gamma and control cost first. @artofnothingness Please see the latest comments from @SteveMacenski in #98 about the ideas of the critic function.

Its sounding more and more to me that a paper in effect of Application of MPPI with Multi-Objective State Costs (Of Mobile Robots or For Path Tracking or some specific task?) would be valuable describing:

I totally agree with this. It would be a valuable contribution even just by writing down the discussions we have had done here after we solve all the issues. Some mathematical contribution would make these much more important. Some recent variants of MPPI are focusing on changing the sampling method to effectively avoid obstacles. But this way is not suitable for our case. We need to narrow it down to the "critic function" part. Perhaps, as just a pop-up idea, we could begin with the goal critics and path critics to see is there any decent connection to the mathematical interpretation of MPPI. As mentioned by @SteveMacenski , we may argue that a quadratic speed cost is not acceptable for robots in crowded human-filled spaces, and these critic functions (maybe with mathematical description) are good alternatives. But I yet haven't heavily thought about it.

SteveMacenski commented 1 year ago

So I'm able to make things quite a bit more smooth if I use more points at a smaller model_dt, but also with some retuning to try to maximize smoothness, its lowered the maximum speed to about 70% desired. I can't seem to find a way to get it up higher without losing the gains in smoothness. See the updates on this branch.

Basically, without PathAlign higher, we skirt really close up to obstacles since the PathFollow and Goal critic are pushing to the robot deeper into the path around corners, causing "short cutting". But we can't really reduce PathFollow or Goal since those are what's driving forward progress.

Pushing PathAlign up will make the system follow the path pretty exactly - which is then away from skirting obstacles - but then it can't diverge or do "smooth" turn arounds with prefer forward. Perhaps this can actually be tuned to be in balance, but a bit fragile and it would be nice if we didn't have to rely on paths to avoid skirting up against collisions for more free-style navigation (e.g. "follow this object" tasks vs having a path plan).

Increasing the Obstacle critic doesn't really solve the problem, because we're already setting a very high value for in-collision trajectories, so we end up just skirting VERY close up to collision (and then often into collision with realistic following error) where its technically valid. If we make the Obstacle critic costs high to compete with the Path Follow / Goal, we drop our speeds + get very wobbly / unstable. Regardless, it doesn't seem to actually solve the problem even with the Obstacle weight at 1e3, we still basically just get right up against obstacles in turns - this result seems especially odd to me.

It seems like we need a new way to drive the behavior forward outside of GoalCritic. If I try to push that higher, the entire thing starts to blow up, even when only using the Goal Critic with no other critics. I'm not sure why it would be so unstable just by itself. We need something else to push the system forward so we can get up to speed. Then we need something to help with distance from obstacles more smoothly. Those seem like the main tasks from here. I tried to tune the system with only the GoalCritic at 1, 2, and 3rd order powers with the weights, temperature, and lambda and was still unable to get up to 90% speed / smoothly. This tells me that I think we need something else to drive the robot forward (if none of the reasonable orders for the critic functions, nor weights applied to it, nor softmax/energy gains work to do this. If the most basic MVP can't accomplish it, then adding in more critics isn't going to do it. Something's wrong with the Goal Critic, its use, or the inputs to it for driving to full speed)

Goal to keep being able to be further from obstacles, follow the path (but be able to deviate / do those 3 pt turns), and move relatively close to the maximum requested speed.

Edit minor: @artofnothingness do you have thoughts on how we can improve the collision avoidance behavior without the PathAlign critic? I have new thoughts on an observation I made late this evening on the speed / smoothness topic in the edit below.

I'm at this branch with these parameters:

    FollowPath:
      plugin: "mppi::Controller"
      time_steps: 56 #70  -- 80    -- 100 (autorally code)
      model_dt: 0.05 #0.04 -- 0.025 -- 0.02
      batch_size: 2000
      vx_std: 0.2
      vy_std: 0.2
      wz_std: 0.4
      vx_max: 0.5
      vx_min: -0.35
      vy_max: 0.5
      wz_max: 1.3
      iteration_count: 1
      prune_distance: 1.7
      transform_tolerance: 0.1
      temperature: 0.15              # 0.1, 0.15, 10
      gamma: 0.01
      motion_model: "DiffDrive"
      visualize: true
      TrajectoryVisualizer:
        trajectory_step: 5
        time_step: 3
      AckermannConstrains:
        min_turning_r: 0.2
      critics: ["PathAlignCritic", "ObstaclesCritic",  "GoalCritic", "GoalAngleCritic", "PreferForwardCritic", "PathFollowCritic", "PathAngleCritic"]
      GoalCritic:
        enabled: true
        cost_power: 1
        cost_weight: 5.0
      GoalAngleCritic:
        enabled: true
        cost_power: 1
        cost_weight: 3.0
        threshold_to_consider: 0.4
      ObstaclesCritic:
        enabled: true
        cost_power: 1
        cost_weight: 10.0
        consider_footprint: false
        collision_cost: 10000.0
      PathAlignCritic:
        enabled: true
        cost_power: 1
        cost_weight: 6.0
        path_point_step: 1
        trajectory_point_step: 3
        threshold_to_consider: 0.40
      PathFollowCritic:
        enabled: true
        cost_power: 1
        cost_weight: 3.0
        offset_from_furthest: 10
        max_path_ratio: 0.40
      PathAngleCritic:
        enabled: true
        cost_power: 1
        cost_weight: 2.0
        offset_from_furthest: 4
        threshold_to_consider: 0.40
      PreferForwardCritic:
        enabled: true
        cost_power: 1
        cost_weight: 7.0
        threshold_to_consider: 0.4
      # TwirlingCritic:
      #   twirling_cost_power: 1
      #   twirling_cost_weight: 10.0

Edit - important discovery:

What's especially interesting to me is that I can see if I just increase the speed limit from 0.5m/s to something like 0.65, things for the same set of parameters are stably moving at 0.5m/s that its able to achieve. Is it possible the lower-valued mean stable velocities we're seeing are related to the noise distribution or clamping? I would have expected higher speeds to mean also unstable in the same way, when I push up Critics that incentivize driving faster, higher. There's some interaction here I don't quite understand. Why is it that simply setting a higher max_vx solves it and is that a possible source of a solution? It seems odd to me that just having a higher max velocity would give us stable behavior over a wide range of values, but incentivizing the driver critic to go faster within a distribution doesn't. That, to me, points to an issue in how the distribution is being used / applied to the critics more than its a function of behavior from the critics themselves.

For example, with this branch, I showed that if I removed applyControlConstraints() after we add noises to the previous trajectory and use the raw un-clamped controls for the trajectories, then clamp at the end after we update control_sequence, things seem to work pretty well and hitting full speed for most of it! Its a bit not amazing, but also things haven't been tuned for this formulation, but its pretty-freaking-good considering its actually using the full speed for once! The remaining smoothnes I think can be handled by tuning. My speculation for why that is:

We have some distribution around zero-mean to start, we find the best controls, then iterate. After a few iterations, we're starting to clamp ~half of the distribution to max speeds and are scored similarly/poorly for whatever reason (maybe because at max speed + some wz component so those are the trajectories going off the edges). That's creating a ceiling for something (?) and suppressing values down, which then has the effect for all future cycles iteratively, that it cannot escape from. When I ask a critic to incentivize those faster things, then we're getting the trajectories that are on the edges of the noise distribution that go from that 70% of max speed closer to the maximum desired velocity number. Since they're on the fringes, they'll be really noisey, making it unstable. If we don't clamp right away, those trajectories can do their thing more naturally, then the softmax function would come in and ??? (this is where ML people with more intuition about softmax functions can chime in if this rings a bell in behavior that would help us) using the unbounded behaviors that those trajectories did well that can be accounted for. Then we clamp the final values for execution.

My speculation is this is due to the softmax function, but I'm not a traditional ML guy, so I don't have a built up intuition on its behavior to know if what I"m describing is behaviorally consistent with that. However, from this experiment, I can't see any other source of a virtual ceiling on the noised control behavior. Its clearly not critic related and the only other operations are clamping and the update control sequence stuff with softmax -- everything else is just mathematical integration of the content as given.

So questions @tkkim-robot

is this valid? Can I just not clamp controls to be "reasonable" until after the cycle? I think it is!
Why does this make it work? I'd like to understand root cause so that we have a plausible explanation for this behavior and maybe if we understand it better, there may be a more elegant solution

I didn't get a chance to test, but I think this would let me drop down the Goal / PathFollow critic a little which would stop us from short cutting as we don't need as strong of a "forward pull" anymore. Might help with the collision stuff above. Still need probably a refinement to the ObstacleCritic, but this observation I think can be the thread to resolve the rest.

Thoughts? This seems like a plausible path forward to me. slides over in chair its been a long day.

tkkim-robot commented 1 year ago

I have looked over your thoughts and results today. The bottom line is that this is valid but it would be better to add some components here.

First, the “clamping” for the action values is definitely mandatory when there are physical limits on the actuator. For example, when controlling a servo motor, we should prevent action values that are larger than the maximum motor angle from being passed through the robot dynamics, because the resulting motions from them are not physically plausible. This is a “hard” constraint. So we should clamp the action values properly.

Similarly, as we have done before, we clamp the actions before evaluating them. This is reasonable, but at the same time, it essentially violates the theoretic aspects of MPPI. From both the Stochastic Optimal Control Theory and the Information Theory, the final form of MPPI originates from the assumption that the injected noises have multivariate normal distributions. But when we clamp the perturbed action values, the distribution becomes like this: Untitled (3)

Therefore, in this situation, the MPPI might not output an optimal result (maybe the control cost and the KL divergence stuffs all go wrong). Although I don’t know how much it degrades the performance, but theoretically, it does. This is the first issue.

The second issue with action clamping is that the controller is disadvantageous to output a control value that is near to the constraint boundaries. Let me give you a biased yet straightforward example:

When an optimal action value from the previous step was 0.5 m/s (equal to the max speed), the parallel samples would be like:

30% of the samples are 0.4 m/s,
40% of the samples are 0.5 m/s,
30% of the samples are 0.6 m/s, but they become 0.5 m/s after clamping.

If the critic function do not give a large difference on the 0.5 m/s samples and the 0.4 m/s samples, the resulting output must be smaller than 0.5 m/s after applying softmax function.

That's creating a ceiling for something (?) and suppressing values down, which then has the effect for all future cycles iteratively, that it cannot escape from.

This maybe results from the above two perspectives.

I think there are two possible solution to this issue. The first one would be using truncated normal distributions. Although this might somewhat alleviate the problem, it still is not a normal distribution.

The second one would be penalizing constraints. It imposes some penalties on the samples that fall outside of the constraint boundaries (let me call this the constraint critic). This is a common idea, as can also be seen in this paper(https://ieeexplore.ieee.org/document/5980280).

What's especially interesting to me is that I can see if I just increase the speed limit from 0.5m/sto something like 0.65, things for the same set of parameters are stably moving at 0.5m/sthat its able to achieve.

I think what you did here was the second option, but with zero penalty on the constraint critic. This is interesting that I also tried the same idea in #98, but I thought that it was a hacky solution. But we might be able to somewhat handle this issue by employing constraint critic.

In our case, the longitudinal speed is an action variable, but at the same time, it is a state variable. Although the motor specs may dictate the maximum speed of a robot, there is no “physical” restriction to this value. My point is, we may remove the clamping before evaluation. After that, we clamp the entire action trajectory to ensure that all future optimal actions are within the available action range of the system.

Let me go back to the example:

30% of the samples are 0.4 m/s,
40% of the samples are 0.5 m/s,
30% of the samples are 0.6 m/s.

If a constraint critic function can penalize the 0.6 m/s samples with a similar magnitude to the penalty of the 0.4 m/s samples from goal/path critic functions, then it will be easier to maintain 0.5 m/s average speed. If the output exceeds 0.5 m/s, we can clamp it at the last step as you did here. If we can tune the controller to achieve full speed even with constraint critic, it will be better than ‘just removing the clamping’ since it is more explainable.

Anyway, this is a great observation! Thank you for sharing your valuable results.

SteveMacenski commented 1 year ago

I agree, that this feels like a sub-optimal choice since we're giving potentially impossible trajectories to be scored and are then weighted in the lambda term, and create trajectory points out of bounds that are optimal but then needing to be clamped, which fundamentally changes it. But that seems like the way forward and we can smooth things with higher gamma/lambda values to help (and the constraint critic you mention).

action clamping is that the controller is disadvantageous to output a control value that is near to the constraint boundaries.

I think this is what we're actually experiencing - you describe it better :smile:

If a constraint critic function can penalize the 0.6 m/s samples with a similar magnitude to the penalty of the 0.4 m/s samples from goal/path critic functions, then it will be easier to maintain 0.5 m/s average speed. If the output exceeds 0.5 m/s, we can clamp it at the last step as you did here. If we can tune the controller to achieve full speed even with constraint critic, it will be better than ‘just removing the clamping’ since it is more explainable.

The problem here is that if we penalize the difference between 0.4 and 0.5, then we're incentivizing the robot to always go full speed, which then has problem stopping or slowing down when needing to reverse. I tried that, its not good. Could we only penalize values outside of the feasible range (eg only things > 0.5m/s)? That way, even if we don't clamp the noised velocities, they're weighted lower when we compute the optimal trajectory for the cycle so that we rely on the lambda function to clip down to valid ranges to some degree. Still need to do absolute clipping through afterwards, but at least the optimization problem can take the bulk of the work. Also would not have that normal distribution spike like you showed.

This is actually a half-bridge to what I was doing with the $V-V_{des}$ critic testing originally to get things to move faster. Just now instead of penalizing everything, its only things out of bounds. I see no reason why this would cause issues with reversing and such either which was the largest concern. Also, nicely, we can penalize not just velocities in bounds, but also accelerations if we choose, which should smooth things out further (though not the main goal right now).

tl;dr Is this a good way forward?

Move clamping from noised control until after we generate the optimal control sequence for the cycle
Add a new ConstraintCritic that punishes velocities outside of the feasible range (e.g. vx < min_vx or vx > max_vx)

I'll play with this today, but I only have like ~1 hour I can give so not sure I'll get far.

Edit: Pushed the updates. Things here are working pretty well, though, from the added constraint critic, I typically see performance get much more unstable when this value is ~1.0 (which I would have thought to be reasonable, on the scale as others). I have it set around 0.3. I tested with and without this critic without any clamping at all (so free to go above 0.5 if it chooses). I found that this critic cuts the max velocity above the threshold in half (0.56m/s without the critic to 0.53m/s with it) without clamping. Setting to 0.6 drops it to 0.51m/s. Though if I set higher values like ~1.0, it gets wobbly. So I think settling for something like 0.4-0.5 is a good way to reap most of the benefits without causing additional noise. Clamping now is just handling the hard constraint enforcement, so the control_sequence_ optimal that comes out shouldn't be shifted very much due to enforcement - this critic handles the bulk of it.

I'll need to do some more tuning based on these changes, but I'm pretty happy with what I'm seeing. I do think though now actually going full speed, we do need that 1% sampling in #96 probably to have the option to slow down closer to collisions. We can definitely push up the temperature now. 1.0 was too high, though made smooth motion at full speed now. Mostly, too high lambdas near zero-mean situations (starting / reversing) makes it really awkward. Since near zero-speed, there's alot of jitter in the trajectories, some moving forward and some in reverse. That makes the robot just stall there for a hot minute averaging too many of them together.

Params

``` FollowPath: plugin: "mppi::Controller" time_steps: 56 #70 -- 80 -- 100 (autorally code) model_dt: 0.05 #0.04 -- 0.025 -- 0.02 batch_size: 2000 vx_std: 0.2 vy_std: 0.2 wz_std: 0.4 vx_max: 0.5 vx_min: -0.35 vy_max: 0.5 wz_max: 1.3 iteration_count: 1 prune_distance: 1.7 transform_tolerance: 0.1 temperature: 0.15 # 0.1, 0.15, 10 gamma: 0.01 motion_model: "DiffDrive" visualize: true TrajectoryVisualizer: trajectory_step: 5 time_step: 3 AckermannConstrains: min_turning_r: 0.2 critics: ["ConstraintCritic", "PathAlignCritic", "ObstaclesCritic", "GoalCritic", "GoalAngleCritic", "PreferForwardCritic", "PathFollowCritic", "PathAngleCritic"] ConstraintCritic: enabled: true cost_power: 1 cost_weight: 0.3 GoalCritic: enabled: true cost_power: 1 cost_weight: 5.0 GoalAngleCritic: enabled: true cost_power: 1 cost_weight: 3.0 threshold_to_consider: 0.4 ObstaclesCritic: enabled: true cost_power: 1 cost_weight: 10.0 consider_footprint: false collision_cost: 10000.0 PathAlignCritic: enabled: true cost_power: 1 cost_weight: 6.0 #10.0 good for relatively exact path following path_point_step: 1 trajectory_point_step: 3 threshold_to_consider: 0.40 PathFollowCritic: enabled: true cost_power: 1 cost_weight: 3.0 offset_from_furthest: 10 max_path_ratio: 0.40 PathAngleCritic: enabled: true cost_power: 1 cost_weight: 2.0 offset_from_furthest: 4 threshold_to_consider: 0.40 PreferForwardCritic: enabled: true cost_power: 1 cost_weight: 7.0 threshold_to_consider: 0.4 # TwirlingCritic: # twirling_cost_power: 1 # twirling_cost_weight: 10.0 ```

tkkim-robot commented 1 year ago

Clamping now is just handling the hard constraint enforcement, so the controlsequence optimal that comes out shouldn't be shifted very much due to enforcement - this critic handles the bulk of it.

This is reasonable.

The problem here is that if we penalize the difference between 0.4 and 0.5, then we're incentivizing the robot to always go full speed, which then has problem stopping or slowing down when needing to reverse. I tried that, its not good. Could we only penalize values outside of the feasible range (eg only things > 0.5m/s)?

The goal/path critic will incentivize the robot move faster and faster. Am I right? So, my point was, we should adjust the magnitude of constraint critic penalty level according to the incentive from goal/path critic function, so that we make MPPI to be somewhat ascetic about going faster above 0.5 m/s despite of the goal/path incentive.

So I think settling for something like 0.4-0.5 is a good way to reap most of the benefits without causing additional noise.

I think what you did here was the same thing.

All in all, everything is clear and reasonable!! The results you mentioned also make sense. It seems like you have taken one step forward!

SteveMacenski commented 1 year ago

@tkkim-robot @artofnothingness I've finished this PR for what I think is now ready to have you guys test and merge.

TODO Remaining:

[ ] For the 3 parameters marked with comments below, fine tune final values
[ ] Update readme and default parameters to reflect this
[ ] This gets awefully close to obstacles. The dev branch does too, but this seems to be slightly worse. I'd appreciate if there's something easy I missed or way to have trajectories give more room to obstacles in turning corners, that would be ideal. From my testing, this happens even if I reverted all of the ObstacleCritic changes and back to the original parameters for it. I think this is happening because we're evaluating trajectories that aren't clamped yet and are weighting towards shortcutting. I see this behavior regardless of other critic parameters being back to their originals as well. But also could have just missed something easy / stupid from staring at the same code for too long.

I was able to reduce the jitter caused by Path Align by using more time steps with smaller model_dts. It adds to the compute time, but still safely running at 50hz with these settings. The changes to the obstacle critic to take the integrated cost instead of the instantaneously highest cost helps with the other major source of discontinuity. These two changes are the major ones that create the smoother behavior. Because we have smoother critic responses and able to get up to full speed, I could also increase the temperature a little to give a bit of extra smoothing action. Overall from speed and smoothness perspective, I'm quite happy with these changes. Once we revive the SMPPI PR, it should be quite good indeed.

    FollowPath:
      plugin: "mppi::Controller"
      time_steps: 56
      model_dt: 0.05
      batch_size: 2000
      vx_std: 0.2
      vy_std: 0.2
      wz_std: 0.4
      vx_max: 0.5
      vx_min: -0.35
      vy_max: 0.5
      wz_max: 1.3
      iteration_count: 1
      prune_distance: 1.7
      transform_tolerance: 0.1
      temperature: 0.3 #TODO fine tune 0.2 - 0.4
      gamma: 0.015
      motion_model: "DiffDrive"
      visualize: true
      TrajectoryVisualizer:
        trajectory_step: 5
        time_step: 3
      AckermannConstrains:
        min_turning_r: 0.2
      critics: ["ConstraintCritic", "PathAlignCritic", "ObstaclesCritic",  "GoalCritic", "GoalAngleCritic", "PreferForwardCritic", "PathFollowCritic", "PathAngleCritic"]
      ConstraintCritic:
        enabled: true
        cost_power: 1
        cost_weight: 4.0
      GoalCritic:
        enabled: true
        cost_power: 1
        cost_weight: 5.0
      GoalAngleCritic:
        enabled: true
        cost_power: 1
        cost_weight: 3.0
        threshold_to_consider: 0.4
      ObstaclesCritic:
        enabled: true
        cost_power: 1
        cost_weight: 10.0
        consider_footprint: false
        collision_cost: 10000.0
      PathAlignCritic:
        enabled: true
        cost_power: 1
        cost_weight: 6.0 # 6, Originally 2.0
        path_point_step: 1 # 1, Originally 2, adds to compute time but I think it helps (?) Someone to sanity check me would be nice
        trajectory_point_step: 3
        threshold_to_consider: 0.40
      PathFollowCritic:
        enabled: true
        cost_power: 1
        cost_weight: 3.0
        offset_from_furthest: 10
        max_path_ratio: 0.40
      PathAngleCritic:
        enabled: true
        cost_power: 1
        cost_weight: 2.0
        offset_from_furthest: 4
        threshold_to_consider: 0.40
      PreferForwardCritic:
        enabled: true
        cost_power: 1
        cost_weight: 5.0
        threshold_to_consider: 0.4

tkkim-robot commented 1 year ago

I think this is happening because we're evaluating trajectories that aren't clamped yet and are weighting towards shortcutting.

I'm not sure but I think taking shortcutting may not originate from "not clamping". The rotational movements seem not to be heavily related with the clamping issue. Don't they stem from path_angle_critic and goal_angle_critic?

Anyway, I think this is great to be merged. Once this PR is merged to dev, I will revive the SMPPI PR and merge dev into smoothmppi branch.

SteveMacenski commented 1 year ago

I will say after a bit more testing today (I'd really like to walk away to IROS with this ready to rock and roll!), I did find that after reverting back to using the single highest cost in a trajectory, it does avoid obstacles to a similar degree as before, however then makes the smoothness more jerky. I'm trying to see if I can come up with a compromise that will get us improved obstacle avoidance and also smoothness. Either way, we need slightly better collision repulsion than we have right now.

I think the jerkiness comes because the previous formulation is cost / max_cost so its always between 0-1. When we multiply by the weight (1.65) that means that the cost ratio of 0.65 and larger result in values greater than 1 -- which is important because the power on this function is squared. So trajectories with max costs greater than 165 are penalized more heavily, while ones less than that, the squaring actually reduces its costs. That creates the separation.

Adding 1 to this would make sure all values are > 1 when going into the squaring, which could use some investigation to see if that's a viable solution. From some really quick testing, with cost = cost_arg / (max_cost); and the penalty pow(1.0 + cost * weight_, power_); and weight = 0.8, power = 2, things seem to work better than before. I'm hoping if I apply this same thing but with the integrated costs and normalize it by the lowest cost (e.g. subtract it from the rest), I can both get this behavior and be able to tune relative to other critics to differentiate the trajectories + not overpower the other critics.

The single costs differentiates trajectories more, so ones close to obstacles are penalized more. Integrated costs scores the trajectories more accurately, but then makes them sufficiently similar that its not helping much. If I push the weights up on integrated costs to differentiate them, then we have issues moving forward out of free-space because the optimization problem is finding a local minimum where it doesn't want to move in costed regions at all (since that cost is so high in magnitude).

So I think I need to find a new way of differentiating the costs between trajectories such that they are more distinct so that I don't need to push up the weights too much relative to other critics -- but also with a more regular response.

Though, I don't think this is a blocker for this PR. Either way, this behavior needs to be improved and its a known thing that we should work on regardless. The content in this PR is ready to go for the purposes we had here (smoothness + full speed).

SteveMacenski commented 1 year ago

Back working in this tomorrow with fresh eyes back from Japan / up to speed again on things

SteveMacenski commented 1 year ago

I believe I've found a solution. I'm working out the last few kinks. The summary is that I've refactored the obstacle critic to use distances (found via costs) rather than costs themselves and penalize only close proximity to obstacles rather than the full range of costs (since if we're broadly in freespace, we probably don't care about being exactly in the center). I've separated it into 2 terms: one for penalizing harshly within a close margin to collision to handle the most egregious problems of scraping next to collision and another weaker term to generally incentivize being further from obstacles (though flushing out if that's even making a tangible impact).

This lets me punish near-collisions severely in a small radius (10-20cm) while not washing out the other terms if applied globally to all distance ranges / costs. Then I can separately tune a term of general trajectories being further from near-collisions for comfort at a relative weighting to the navigation critics within margins around obstacles that we'd actually care (~1m). Then > ~1m, this term would apply no cost since its not in concern of collision.

The thinking is that we don't necessarily care about being in the center of spaces if the path plan didn't send us that way, so we're just going to punish moving into ranges that are problematic rather than trying to optimize for the exact best distance from obstacles. I think that should free up alot of the issues w.r.t. the jitter this term introduces as well. So far, I'm seeing the best results yet but still some experimentation to do to tell what the best formulation this is without making it hyper-sensitive to input parameters. Plus, requesting goals in high cost zones aren't quite getting to their goal position smoothly, so might need to tune or refactor something else (e.g. punish obstacles less when on approach to goal).)

Its funny how much a few weeks of distance from a problem can help change your viewpoint.

SteveMacenski commented 1 year ago

@artofnothingness can you review this now? I think its ready to get feedback on.

artofnothingness commented 1 year ago

@artofnothingness can you review this now? I think its ready to get feedback on.

Sry, was quite busy in recent time and home pc was crashed. I'll try to take a look on this soon

SteveMacenski commented 1 year ago

I merged in the bug fixes, unit tests, and doxygen entries from #104

Running quick CI on a Nav2 branch to get some coverage metrics https://github.com/ros-planning/navigation2/runs/10002533028 for staging. Still more work to do, but just wanted to see how close I am to 92% from my 2 weeks of unit test writing :laughing: .

Its 91.3%. There's a couple of gaps to fill in coverage it showed me but all are pretty easy.

SteveMacenski commented 1 year ago

[x] Update defaults in: critic files, readme table, and example YAML
[ ] Get feedback from +1 user
[x] Additional unit tests for findPathTrajectoryInitialPoint, findPathCosts / setPathCostsIfNotSet, PathAlign case for path occupancy