SteveMacenski commented 1 year ago

See https://github.com/artofnothingness/mppic/pull/77#issuecomment-1178138581

A good place to start is a critic to weight going the full speed, with probably a low weight to just incentivize but not force.

SteveMacenski commented 1 year ago

@artofnothingness any thoughts on this one?

The most obvious thing to try is just penalizing the difference between the maximum speed and the speeds in the trajectory or speed of the current state? I believe the latter is what "Aggressive Driving with Model Predictive Path Integral Control" does in Eq. 35.

I have a prototype I got started with this afternoon in the ~45 minutes I had which uses the state information, though some TODOs:

Getting parent namespace for getting controller-level parameters
Getting the size of the $V_{des} - V$ to be for each trajectories of size of $costs$. This is the main thing that makes me wonder if this is really the best option or if we should instead base it on the fastest element in the trajectories (or average or something) rather than the current state alone. Worth prototyping each of them to see.
Considering if we want to specifically handle negative velocities or just say generally when moving (Forward or reverse) prefer to be closer to the max possible. I imagine we want to handle the negatives specifically rather than generally.

https://github.com/artofnothingness/mppic/tree/vel_critic

SteveMacenski commented 1 year ago

This seems to be one of the last tickets we can do before another set of hardware testing evaluations

SteveMacenski commented 1 year ago

Fixed those issues -- now just up to the actual data.cost function design.

I pushed some updates using the mean of the vxs which seems to work if I set the weight to something high (~20-100) but also when the robot is going faster, it seems less stable / more wobbly with this critic added encouraging faster translational speeds. Which obviously is problematic.

SteveMacenski commented 1 year ago

Try next:

~higher powers, smaller weights~ not required
~max, min, average, weighted average - other formulation of vx?~ bad
~first couple of timesteps (or first timestep) rather than full trajectory~ realllllly bad idea
Penalize each velocity in trajectory each, summed up.
Only if vx is positive
~Try in open space without obstacles around - maybe thats why its needing so much of a push. If not...~ not required, found some working
~Try to rationalize why its not reaching it right now and come up with a more philosophical solution than just seeing what data is easily available to me~ probably not required

SteveMacenski commented 1 year ago

Averages/min/max/average fail.

But penalizing the difference of each trajectory point summed works too well (needs normalization?) and penalizing only the first trajectory works pretty well too so a couple of potential good path forward here exists. Working on it tomorrow.

The main issue now is that because we penalize the difference between the desired velocity and the actual velocity, it doesn't allow reversing (or even slowing down) and causes collisions because its too intense of a force so the robot doesn't want to slow down. This is even with a penalty weight of only 1.0 with the units of $\delta v$.

By the time the weight is large enough to meaningfully speed up the robot, its also too large to allow it to back up or slow down.

SteveMacenski commented 1 year ago

Mhm, this seems odd. If I increase the weights of the existing critics without specifically penalizing higher speeds, i get higher speeds out. This seems like an odd result since I just made them all proportionally higher. I see the same result if I just increase the power of cost from linear to quadratic. Admittedly the system becomes a bit less stable (perhaps needs retuning or changes in the softmax gain?).

@artofnothingness do you have any thoughts on this? Perhaps the right move isn't actually to penalize for higher speeds at all but make the existing critics have higher penalties. Do you know why that would be that just higher (but proportionally so) would select faster trajectories? That would solve a great deal of this issue if we don't need to actually create a specific new critic.

I found this result by trying critics 1-by-1 and realizing that none of them are the reason its going slower. If I only use the GoalCritic alone, I would expect it to drive near full speed, but in fact drives at the exact same speed. But if I increase the weight or increase the power, then it increases speed (and also becomes a bit unstable, which is odd).

artofnothingness commented 1 year ago

Hmm. This sounds weird.

It could be that temperature parameter make weighting less cost sensitive, so we just get more random-average values than such that minimize cost function

Did you try to change temperature parameter ?

artofnothingness commented 1 year ago

I think we need to dig into original papers. This implementation has a lot of invented experimental stuffs.

SteveMacenski commented 1 year ago

Did you try to change temperature parameter ?

Just tried that. Yes, if we lower the value from 0.35 -> 0.1, then we go from 75% of the max speed to 90% of the max speed. If I lower it to near-zero (but non-zero) then its virtually butting up against that max speed value (but with quite a large variance of values, I assume due to destabilization).

I see a normalization step before finding the softmax, so if things are proportionally the same, it should be similar outputs? (ex. cost A is 1, cost B is 5 vs cost A is 10 and cost B is 50). The exponential function probably warps that a bit but seems rather silly that just higher values of costs, but proportional to each other the same, would impact things that much. Perhaps its worth seeing if we can retune the system using squared costs? Just squaring the current weights makes it unstable, but at least it moves more closely to max speed.

I'm perfectly happy if we don't hit the full max speed at all iterations, but I'd like to at least be in the right ~5-15% ballpark. It doesn't seem like lowering temperature is wise though, that is definitely very destabilizing. Some weighed averaging is useful. But what do you think? It seems like our options are (1) higher powers (2) higher weights (3) lower temperatures.

Perhaps additionally a "smoothing" critic to handle destabilization (maybe penalize large $\delta v$ between timesteps in the batches; or penalize rolling averages of timestep velocities for some window size for each trajectory in the batch) would be required after 1/2/3 to get the speeds higher and stable, rather than a desired speed critic. I think designing a desired speed critic is a losing battle due to need for reversing / stopping (unlike the MPPI paper that just infinitely goes around a race track)

SteveMacenski commented 1 year ago

I went over the autorally code from the MPPI authors and agree that what they have is the same as what you've done in the updateControlSequence function.

The only thing they do after that point is apply a Savitzky–Golay filter https://github.com/AutoRally/autorally/blob/c2692f2970da6874ad9ddfeea3908adaf05b4b09/autorally_control/include/autorally_control/path_integral/mppi_controller.cu#L416-L446. This paper characterizes it. This doesn't help with full speed, but it does help with jitteriness, so if we do this then we can decrease the temperature value - my assertion is.

They use a gamma value of 0.15 to our current 0.35. If we drop to atound 0.15 I showed gets us to around 85% of the max speed (on average, sometimes up to 90%) if we can be stable with that kind of control smoothing over time iterations. That's pretty in line too with the MPPI paper.

What do you think? Wouldn't require any retuning or messing with critics, smoothing a local buffer of recent commands. This actually seems pretty sensible, softmax is over smoothing us, so we reduce that to use a narrower band but then smooth the output at a second level.

artofnothingness commented 1 year ago

That's great observation. We should implement this. Definitely high temperature not the greatest way to get smoother trajectory

SteveMacenski commented 1 year ago

So the MPPI in Autorally does the SGF filter on the output of the control, but the paper recommends doing something else (after reading it now in more detail).

They claim the filter reduces convergence, who's argument is relatively intuitive but also I'm not sure for our situation it really matters. But, they claim something that's more theoretically bounded in the MPPI framework, so I figure that's good to utilize. Their choice of language around "i-axis" and "t-axis" is pretty annoying though. As you're reading that, i-axis is comparing the optimal control from the last iteration with the current information and t-axis is the trajectory points of the batches. If Figure 6 is to be believed, their method is quite a bit more stable than just applying the SGF filter on the control output of MPPI. Mostly for this reason I read the paper and tried to understand what we have to do to try it out.

Can you take a look over this paper and let me know if this aligns with your understanding as well? I think we need to:

Initialize a new optimal action sequence $a_t$ to track, analogous to the optimal control sequence we track between iterations, $u_t$.
As we're rolling out trajectories, track it via $a_t^k = a_t + v_t^k \Delta t$, where $a_t$ is some initialized action (0s?) and $v$ is is our noised control over $k$ batches with $t$ timesteps. So basically have a full tensor of as like we have controls.
Add a new cost term to minimize $a$: Σ $\omega (at - a{t-1})^2$ over each timestep in the trajectories. Maybe a critic? Or to be handled in updateControlSequence as a required step? I think adding to updateControlSequence as a parameterized option would be my intuition. That way we can have a single if check for adding this to the normalized cost and whether we should update $a_{t+1}$. Though, I question if a convex cost function for this is necessary if our others are linear, so the critic power variable could be useful if this needs to be played with over time. But if we're adding in a new a tensor and optimal a between iterations, its now part of the formulation, not just isolated to a critic.
Update $a_{t+1}$ as $at + U{i+1} \Delta t$ to update

The method also shows $a_t^k$ being used in the dynamics model (ours is just pass through), so I'm also wondering if that's a "key" element of this that since we're not using an AI model. Or if we're safe because the impact really comes in from the cost function term to minimize that action space separately of the control space. I suspect its the second. From a mathematical perspective, I think $a$ tracking between cycles stabilizes between iterations and the minimization of the change of $a$ within the cost function incentivizes smoother inter-trajectory points. I'll say the paper lacks a concise sentence (unless I missed it) about how exactly this change smooths over both axes, so that's my conclusion, not the papers.

Though, I wonder if we can't just add in a smoothing critic $weight * (vi - v{v-i})$ and not also solve the problem by a direct objective function. I'd have to think they tried that first and wouldn't have written a paper if that helped. I suspect that doing so incentivizes basically just not moving or not being super reactive, so maybe that's why they say their method is "up-lifting" whereas I think the impact of that critic would be "suppressing" (?). There would definitely be a constant fighting between that smoothness critic and whatever is driving the robot to move forward (goal / path follow) so it would need to be carefully tuned to work well -- which may or may not be possible. But heck, I'll give that a try tomorrow and see if it works.

~If you agree with my overview of $a$, would that be something you'd be open to prototyping? That's pretty xtensor heavy. I'm sure I could handle it, but I suspect you'd do it in a fraction of the time.~ Actually, if this is going to end up in Nav2, I really should get more comfortable with these APIs. I'll play around tomorrow or Thursday and let you know if I run into problems. I would still appreciate your take on the paper to make sure I've interpreted the changes needed correctly.

SteveMacenski commented 1 year ago

I pushed the change to the following branch: https://github.com/artofnothingness/mppic/tree/smoothmppi

I'm still playing around but this seems correct to my current understanding of the paper. I know the changes to updateControlSequence aren't great, but focusing on styling after things are generally working. Things are compiling and running though, I think I'm alot more comfortable with the xtensor APIs than I was a couple of months ago :laughing:

I think the trajectories are being generated using the control velocities and not the "action" velocities, so its scoring trajectories that aren't directly analogous to what the actual velocities being sent to the base are (since those are now "action"s). I've gone through and updated the motion model to use avx instead of cvx which then updates state.vx for trajectory generation and it seems to work better.

Now the issue I'm running into is that the behavior is nutty of the system, its relatively smooth in motion (wobbly oscillations though), but seems to ignore the penalty functions for actually stopping at the goal or following the path (among others; it seems like its on drugs).

The optimizer is regularly having to be reset because the trajectories aren't super responsive to obstacles - but that could have something to do with the virtually suppressed normal distribution so the samples aren't further apart (e.g. before vx had a distribution it, now its N + 0.1 * vx which scales back its effect on output trajectories). When I rise the stds, things go super haywire. Because this incentives "smoothness", when we are turning, it wants to keep turning which seems not great - resulting in some wavey trajectories where its trying to continue turning, then overcorrect, and so on. Maybe its a tuning thing, but I tried some values and didn't (at least immediately) see a change in that behavior with some orders-of-magnitude testing.

I'm not sure exactly what's going on there, but at least it moving around is progress :+1:. The branch is up to date on my work today, please take a look and let know if you make any progress. I'm not sure if I'll get a chance to continue this tomorrow and I'm on vacation on Friday. I put in ~4 hours on this today to get this far!

SteveMacenski commented 1 year ago

Well, I have it kind of working, but I'm not sure why.

If I don't clamp the cvx (and y, theta, but going to just talk about the xs here for brevity), things are unstable. If I clamp cvx after its generated by the noised controls, its also not stable. But if I clamp it after the Actions avx is computed in generateActionSequence() things seem to work out fine... so that means the avxs are based on some cvxs which are not yet clamped, but if we clamp it after for use in the softmax function evaluation... that works... which it shouldn't (I don't feel).

But if I clamp the actions to be within valid bounds of velocities avx, then it also is unstable. So... that's super weird. I have no idea what's going on there or if this is representative of some bug or misunderstanding I have. This feels like I stumble on something that kind-of-works-ish but shouldn't based on something else being wrong needing to be corrected.

I could use a second pair of eyes at this point.

There are TODOs all over the place to clean up the code, but the general branch above "works" now, though I suspect there's something wrong.

Its pretty smooth with a temperature of 0.15 now. Though, if I preempt a navigation request with a new one (such as making a robot drive straight up, letting it go for a bit, then straight down) it takes a long time to react to the new path. In some cases, it tries to go a little nutty and ends up in a bad situation and needing to reset the optimizer. It doesn't surprise me greatly that there's some delay, but this is a problematic amount. Another problem for another time though after this is working to figure out what makes the most sense there.

SteveMacenski commented 1 year ago

Still working through the issues in Action space, but see work on post-filtering with the Savitsky Golay filter in the smoothmppi_w_smoothing branch. Its largely working, though I'm not incredibly impressed. Does a decent job, but not quite good enough to drop the softmax function to get within 10% of max speed. I'll look into implementing more coefficients later if we can't get the other option working first. I did this mostly as a backup / have something to compare to.

SteveMacenski commented 1 year ago

@artofnothingness the #99 PR is finally ready for you to give a review on

SteveMacenski commented 1 year ago

@artofnothingness the https://github.com/artofnothingness/mppic/pull/99 PR is finally ready for you to give a review on... but for real this time. I'm completely done and happy with the results. Just waiting on your thoughts.

SteveMacenski commented 1 year ago

99 has confirmed to resolve on hardware

artofnothingness / mppic

Achieving full speed on straight-a-ways #80

99 has confirmed to resolve on hardware