cra-ros-pkg / robot_localization

robot_localization is a package of nonlinear state estimation nodes. The package was developed by Charles River Analytics, Inc. Please ask questions on answers.ros.org.
http://www.cra.com
Other
1.37k stars 880 forks source link

DIfferential parameter - suspicious calculation of the velocity covariance #356

Open mikjh opened 7 years ago

mikjh commented 7 years ago

So I saw that (#261) was touching this issue, but I couldn't see whether this exact problem got adressed.

So to my question: how is the covariance when differential parameters is true calculated, and how was this derivated? The expression can be found here.

To me the current implementation seems incorrect. Firstly because the covariance of a velocity should have the unit [m^2/t^2] (whereas the current has [m^2 * t]). Secondly, the calculated velocity covariance should increase with decreasing time, and not decrease as it does now (just visualize two position uncertainty ellipses and letting them drift apart. The spread of the slopes of the vectors going from one ellipse to the other would then converge.).

Thoughts: so if we calculate the measurement estimate as v_t = (xt-x{t-1}) / dt, the covariance should be P_v_t = (P_x_t + Px{t-1}) / dt^2, and not as the current (P_x_t + Px{t-1})*dt. (This directly from Var((xt - x{t-1} / dt)) = Var(xt - x{t-1}) / dt^2 = (Var(xt)+Var(x{t-1})) / dt^2.). Please tell me if I am overlooking or oversimplifying something.

Thanks.

ayrton04 commented 7 years ago

Thanks very much for reviewing the code!

Yes, I can definitely see why your formulation is correct. When I first added differential mode, I tried doing Σ^2/dt (not dt^2, admittedly), but, as you pointed out, the covariances ended up being larger - massive, in fact, for small dt - and I found that convergence speed was slow, or that if there were multiple sources of velocity data, even if the differentiated pose data was much more accurate (e.g., laser scan-generated pose data), it would be trusted less than the potentially lower-accuracy (e.g., wheel encoder) data.

Regardless, as you pointed out, multiplying a random variable by a constant means we multiply its variance by the square of the constant:

var(Y - X) = var(Y) + var(X)
var((Y - X) / dt) = var(Y / dt) + var(X / dt)
                  = (var(Y) / dt^2) + (var(X) / dt^2)
                  = [var(Y) + var(X)] / dt^2

...so I agree that this should be fixed.

Any chance you'd like to submit a PR?

mikeboss90 commented 7 years ago

Thanks for the answer. Sure, I can submit a PR for this. It will be my first PR so bare with me if something goes wrong.

Though, how do we deal with the convergence problems when we divide with dt^2? The calculated velocity covariance will get even bigger than when dividing by dt (since I guess dt will be less than 1). Do you have any specific test-bag files or a test case for the described scenario (laserscan + velocity-frama data)?

update note: this is the same account as mikjh. Accidentally logged in with the wrong account.

ayrton04 commented 7 years ago

It's a tough call. Consider the case where we have a pose measurement at time 0.02 of with a variance of 0.1, and a measurement at time 0.06 with a variance of 0.1, then we'd have

output_var = (0.1 + 0.1) / (0.04 * 0.04) = 125

We'll have to think about the best way to address this.

mikjh commented 7 years ago

I think the problem with the constructed example is that we cannot be so sure about the pose-differentiated velocity in this case (for small velocities).

So if we have a position estimate p_0 and p_1 with a covariance P_p of 0.1, then basically we can only surely say that p_{0,1} = E(p_{0,1}) +- sqrt(0.1). And since sqrt(0.1) ~= 0.3, it should mean that abs(p_0 - p_1) > 0.3+0.3=0.6, i.e. that the two positions must be at least 0.6 m spaced apart to safely say that p_0 and p_1are two distinct positions.

Then, if we assume constant speed v going from p_0 -> p_1, we atleast have to travel 0.6 m in 0.04 s => 0.6 / 0.04 = 15 m/s (~ 54 km/h) to be able to make this distinction. But still then we have a big uncertainty, namely v = 15 +- sqrt(125)=15 +- 11.

I guess there is a limit of how reliable speed information you can get from differentiating already uncertain positions. For the above case, if we got those pose estimates at those times, the velocity output should have a very high covariance for low speeds, as the uncertainty ellipses overlap and we can't really deduce anything.

The question is though, as you mentioned, how to make it reasonable in comparison with the other data. Was the raw measurement noise of the laser scanner data well tuned when you compared it to other sensor sources?

ayrton04 commented 7 years ago

I guess there is a limit of how reliable speed information you can get from differentiating already uncertain positions.

This is where I struggle. I feel like the error on the computed velocity, assuming a constant time step, should be invariant to the error in the pose.

Let's consider only odometry for now, and only for the X axis. Imagine that we start driving straight forward at a constant velocity of 0.5 m/s. After one minute, our wheel encoder odometry produces an X position of 30.0 with a variance of 1.0. One second later, it has a position of 30.5 with a variance of, e.g., 1.04. According to the current formula, the delta in X position would be 0.5, and the variance would be 1.0 + 1.04 = 2.04.

After two minutes, our X position is now 60.0, with a variance of, e.g, 2.0. One second later, it has a position of 60.5, with a variance of 2.04. If we take the delta in the positions and again use the formula we're discussing, we would get a delta in X position of 0.5 with a variance of 2.0 + 2.04 = 4.04.

This doesn't make sense to me. In each case, the robot is traveling for the same amount of time and with the same velocity, and so the errors on the deltas between the measurements should, at least to me, be the same.

Jumping back to an example that was once presented to me, imagine you have a box of cereal, and it contains A amount of cereal, with a variance of sigma_A. You then remove an amount B from the cereal, with a variance sigma_B. The final amount, F, that you have in the cereal box would be A - B, with a variance sigma_A + sigma_B. We therefore established these two relationships:

F = A - B
sigma_F = sigma_A + sigma_B

So the variance of the final quantity has an error of sigma_A + sigma_B.

However, that is not what we are computing here. We are computing the amount of cereal that was removed, because we are given F (the current state) and A (the previous state), along with their respective variances. However, those original relationships would still hold; we're just rearranging them:

F = A - B
sigma_F = sigma_A + sigma_B
B = A - F
sigma_B = sigma_F - sigma_X

So the error on the delta (the change in cereal amount) would be the difference in variances between the final quantity and the starting quantity. In other words, the measurement of the final amount - which we are given, rather than calculating - already accounts for the fact that its variance is really sigma_A + sigma_B. We're just backing out what sigma_B is.

I realize I may be off-base here, but it seems to me that we are given the following quantities:

X (first measurement)
sigma_X (error on first measurement)
X + Y (second measurement)
sigma_X + sigma_Y (error on second measurement)

...and we want to compute Y and sigma_Y, which are really (X + Y) - X and (sigma_X + sigma_Y) - sigma_X.

mikjh commented 7 years ago

It's not clear if a wheelencoder is the best example for this (dependent measurements?), as it seems to have an accumulating variance.

For the example, we have something like (let's hope my figure parses correctly.)

|--x0--|<---------->|----x1----|<------>|------x2------|<-->|--------x3--------|
           d0min                  d1min                 d2min
<-------------d0max------------>        <-----------------d2max---------------->
                    <--------------d1max--------------->                     
    <--------d0x---------><--------d1x---------><---------d2x--------->

, where we only then know that d{i}min < d{i}x < d{i}max, and as the uncertainty of the positions x{i} grows larger, the less restricted d{i}x is.

I'd say from the above visualization, that the variance dx between pose estimate is dependend on their covariances and does make sense. Further calculating the velocity from two measurement should then also be correlated with the uncertainty in the positions.

Please note that I did not visualize your example exactly, because then the x{i}:s had to overlap each other.

ayrton04 commented 7 years ago

So I totally agree that if we had two totally independent measurements, e.g., from a GPS, and each had some variance, then the difference between them would have the combined variance, as the true poses could be at the far extremes of the covariance ellipses.

But, as you stated, that's not always the case, especially when the poses are not independent. In my example above, if the robot had a variance of 1000 at time t1, and a variance of 1000.1 at time t2, then the variance of the delta would be 2000.1. The delta is equivalent to asking "how far did the robot travel between these two measurements," and if those measurements are only, say, 0.1 seconds apart, then it's kinematically infeasible for the robot to have an error of 2000.

Looking at it another way, odometry data is derived from encoder ticks. Between those two time steps, we can only have had so many encoder ticks, with some small error.

It seems to me that what we're getting at here is that the differential mode needs more information, which is how the pose measurement was derived. If each measurement is totally independent and not cumulative, then we'd compute the differential variance using one method, and another method for cumulative/dependent measurements. This could spell (yet another) parameter.

mikjh commented 7 years ago

Also mathematically it checks out, the full formula for the difference of two random variables is

Var(X-Y) = Var(X) + Var(Y) - 2*Cov(X,Y)

, and only when X and Y are independent we have 2*Cov(X,Y) = 0.

Using X = x1, Y = x2 = x1+v*t (as in your example) =>

Cov(X,Y) = Cov(x1, x1+v*t) = / Cov(X,Y) = E(XY) - E(X)*E(Y) / = 
                           = E(x1*(x1+v*t)) - E(x1)*E(x1+v*t) = 
                           = E(x1^2) + E(x1*v*t) - E(x1)^2 - E(x1)*E(vt) = 
                           = /E(v*t) = v*t => E(x1*v*t) = E(x1)*v*t and 
                              E(x1^2) - E(x1)^2 = Var(x1) / =
                           = Var(x1).

So then we actually have: Var(x1 - x2) = Var(x1) + Var(x2) - 2*Var(x1) = Var(x2) - Var(x1).

I agree with you, the differential parameter needs more information. Could we assume that independent pose estimates go into the filter as pose messages, and dependent, accumulative messages go into the odom message? Then we could calculate the velocity in two separate ways there.

Otherwise, as a solution, we could let the user decide the "pose_dependency" as a parameter within [0,1], where 0 is independent and 1 full dependent, and then calculate the difference as Var(X-Y) = Var(X) + Var(Y) - pose_dependency*2*Var(X).

Any other suggestions?

ayrton04 commented 7 years ago

Very sorry for what is an unacceptable delay in responding. Thanks for working out the math with this. I think we'd have to just set some kind of differential_independent parameter, and if it's true, we just add the variances, and if it's not, we go the other route. I'm fine with just using an if statement and changing the covariance calculation based on the parameter.

skohlbr commented 7 years ago

Note there has been quite some research on fusing estimates that have unknown correlation (i.e. are not independent as assumed in standard KF) using covariance intersection.

ayrton04 commented 7 years ago

Thanks!

akhil22 commented 6 years ago

Hi, I just have a quick question .. does current implementation of robot localization calculate the velocity covariances in differential mode using Var(x) - Var(y) or is it still adding the variances? if its the later I would like to change it to use Var(x) - Var(y). line 2798 in ros_filter.cpp seems to suggest that its the later case.

ayrton04 commented 6 years ago

It still adds them, I'm afraid. The filter assumes independence between the measurements. Changing that would require that we add yet another parameter, something like odom0_differential_independent. Not impossible, of course. Just needs to be done.