Rare issue caused by navsat_transform and bad IMU data

cra-ros-pkg / robot_localization

robot_localization is a package of nonlinear state estimation nodes. The package was developed by Charles River Analytics, Inc. Please ask questions on answers.ros.org.

http://www.cra.com

Other

1.4k stars 898 forks source link

Rare issue caused by navsat_transform and bad IMU data #325

Open msy22 opened 7 years ago

msy22 commented 7 years ago

I've identified a rare issue that I think was caused by navsat_transform and an especially bad IMU reading. It's only happened once to my knowledge so this issue can take very low priority. However I wanted to mention it here because it could lead to a simple enhancement for _navsattransform.

I'm using a Jackal UGV in a sloped, outdoor area and localizing it using the _robotlocalization package. Normally each data set is consistent, however one data set sticks out. The following chart shows the profile of the slope, both datasets are taken directly from the /odometry/gps published by _navsattransform.

comparison of z2 profiles

The red trendline shows the correct cross-sectional profile of the slope. Every dataset except one agrees with this. The blue dataset is wrong, the raw GPS data matches the normal slope but the data in /odometry/gps does not.

To give another view of the same data, this is what /odometry/gps looks like when its displayed in RViz for the correct slope profile:

20161115_normal_z2_profile _cropped

And this is the same view of the distorted dataset:

20161027_distorted_z2_alt_cropped

What I think has happened is this: when _navsattransform launches it takes one reading from the IMU and uses this to determine the orientation of the /map reference frame (set up by a global instance of the _ekf_localizationnode that fuses IMU, wheel encoder and IMU data). Navsattransform then uses the /baselink -> /map transform to convert each GPS position into an equivalent pose in the /map reference frame. BUT if that initial IMU reading was very wrong, the reference frame would not be level with the horizontal plane, and this could add an angular offset to every subsequent odometry pose. _Navsattransform depends on, and yet also indirectly determines the position and orientation of /map. Because of this, it is effectively a positive feedback loop that could reinforce that angular error. I think that is what has happened here, because the erroneous profile (blue) has the same shape as the correct one (red), but with an angular offset. I also run another local instance of _ekf_localizationnode that fuses only IMU and wheel encoder data. In the erroneous dataset, the output from the local EKF showed the correct slope profile.

I think the most likely cause is this bad first IMU reading, and a possible enhancement to prevent this in future would be to have _navsattransform collect multiple IMU measurements and average them, as a sort of calibration process.

I think this is the cause, which is why I posted it here first. But I'm not 100% certain so if anyone thinks that there is a good chance that the cause is different, I will re-post this on ros.answers in greater detail for the wider community.

ayrton04 commented 7 years ago

So I'd have to dig into this more, but navsat_transform_node doesn't care about any of the orientation variables except yaw:

https://github.com/cra-ros-pkg/robot_localization/blob/kinetic-devel/src/navsat_transform.cpp#L271

Roll and pitch are inconsequential, since a robot that drives forward ten meters down a hill will still have a correct X-Y position, since it's just coming from the GPS antenna, which will recognize that you've traveled < 10 meters in the X-Y plane. In this case, I'm assuming that you enabled (or rather, didn't disable) altitude data from navsat_transform_node, correct? Is it possible that your GPS didn't have enough satellites during this particular run? Can you look at the raw NavSatFix messages and plot their altitudes?

Also, navsat_transform_node does not produce the map reference frame; both odom->base_link and map->odom are entirely handled by the EKF(s). The only transform published by navsat_transform_node is the world_frame->utm transform. The orientation of the map frame is entirely dependent on what sensors you're feeding it (really, orientation of the map frame is meaningless - it has to have an orientation relative to something. In this case, I'm assuming you mean the UTM grid?). navsat_transform_node will consider its first yaw reading and your robot's pose in the map frame at the time of the first GPS reading in order to compute the transform from the map frame to the utm frame.

msy22 commented 7 years ago

Thanks for your help again Tom! It looks like my theory is probably wrong. I'll look deeper and start putting together a ros.answers question. I also realize I haven't done a particularly good job of outlining my theory. So to clarify:

Yes, I do use the altitude data from navsat_transform and the raw altitude is always correct, which is why I think the problem occurs afterwards when the altitude data is being converted to the odometry message in /odometry/gps. As for the number of satellites, there are trees on either side of this slope at the bottom, so I do lose quite a few. I've plotted it all here:

better comparison of z2 profiles

The data comes in at different rates, so I've shuffled it a bit to make it match up as best I can.

Regarding my theory about the cause: Although navsat_transform only directly cares about the yaw, based on my current understanding it is indirectly dependent on the orientation (specifically roll and pitch) of the /map frame. The way I understand it, navsat_transform uses the /map -> /base_link to convert the GPS coordinates to an odometry message. That's based on this line of code:

https://github.com/cra-ros-pkg/robot_localization/blob/62c398047c31d01fa80744376355958b4403be21/src/navsat_transform.cpp#L659

(But honestly my understanding of CPP and the ROS TF package is still pretty rudimentary so this line may not being doing what I think it does). So what I think has happened is that /map has somehow become skewed, so every subsequent /map -> /base_link transform has this angular offset, which would cause the odometry in /odometry/gps to have an angular offset, but not the raw GPS data. My crude drawing my illustrate the idea a little more clearly (/map is shown in red):

20161123_110847

And because the global EKF fuses the data from /odometry/gps, this error is self-reinforcing. That's what I mean when I say navsat_transform is dependent on, yet also determines the location/orientation of /map.

Regarding you're second paragraph: Yes, sorry you are correct. the global EKF creates the /map reference frame. And yes i'm referring to the /map frame being relative to the UTM grid. As for the other sensors, I am fusing the absolute roll/pitch/yaw from the IMU in the global EKF but I don't understand EKFs well enough to say if this could also be the cause or if they should have pulled /map back to the correct orientation.

Like I said, this has only happened once, and most people won't be using robot_localization outdoors in the way I'm using it. So as far as bugs go it may not be worth the time it would take to find and debug it.

msy22 commented 7 years ago

For reference, here's is a link to a shortened bagfile of the distorted data run: https://drive.google.com/open?id=0B1KZT92BcdVNSm1xdWE2cURpeFU

ayrton04 commented 7 years ago

So the line that you referenced is not using the map frame data; it's just a utility method in TF that lets you convert from TF data types to ROS messages.

The only time navsat_transform_node uses the map->base_link transform is when it needs to know how to remove the base_link->gps offset (the offset from the origin has to be rotated by the vehicle's orientation before we can remove it). However, it does use the first measurement it receives from that EKF:

https://github.com/cra-ros-pkg/robot_localization/blob/kinetic-devel/src/navsat_transform.cpp#L694

This pose is equivalent to said transform. It uses the very first pose it receives on this topic in all future computations of the /odometry/gps data, and I'm guessing that's where this is coming from. It's not the IMU measurement, as again, we only care about yaw from that sensor. Unfortunately, the data coming in from the EKF has already been filtered, and so I'm reluctant to average poses in this case, since it points to a different issue. As I see it now, it could be one of two things:

The EKF receives a "bad" measurement from the IMU, and its pitch angle is wrong for just enough time steps that navsat_transform_node receives said measurement and uses it.
The EKF receives a wheel encoder odometry message before it receives an IMU message, and spits out some pose that does not include IMU data. This causes all future orientations to be off.

One thing I'd suggest is echoing /rosout in all of your bags, and grep on "navsat_transform." Look for the message that has this format:

--
  stamp: 
    secs: 1477530192
    nsecs: 269316514
  frame_id: ''
level: 2
name: /navsat_transform
msg: World frame->utm transform is Origin: (-4978634.8139487458393 -1490016.2970192900393 440627.61082779848948)
Rotation (RPY): (-0.09213616121609917442, -0.055672824378751983865, -1.1538439146748924458)

file: /home/administrator/robot_ws/src/robot_localization/src/navsat_transform.cpp
function: NavSatTransform::computeTransform
line: 285
topics: ['/rosout', '/tf_static', '/odometry/gps', '/gps/filtered']

See if the transform in the other bags differs significantly from this one.

msy22 commented 7 years ago

Looks like I've completely misunderstood that part of the code. So the map -> base_link transform is only used to remove that base_link -> gps rotational offset. But I'm still confused as to how navsat_transform_node converts UTM coordinates to a pose message relative to /map. That first pose from the EKF that is used in all future computations of /odometry/gps, is that pose used to directly calculate /odometry/gps, or is it just used to calculate the UTM coordinates of the /map origin? Thanks for the clarification by the way, I really appreciate it.

As for the map -> utm I couldn't get grep to cooperate with me but I got the messages out of rqt. Here is the tf for the distorted dataset:

20161027_distorted_transform

And here is a more typical tf:

20161027_normal_transform

The roll difference is 0.09 degrees, and the pitch difference is 0.6466 degrees. That seems like a negligible difference, so it potentially rules out the first theory. To be perfectly honest, if this is going to be difficult to pinpoint the exact cause, it may not be worth pursuing since it's such a rare bug, and I don't want to waste any more of your time that I already have.

ayrton04 commented 7 years ago

No worries at all! There's clearly something going on that is worth investigating. Thanks for reporting it.

msy22 commented 7 years ago

Not a problem. I'll keep thinking about it just in case I stumble across the answer, but I'd still like to clarify how navsat_transform works just so I have a better understanding of the system.

I've been re-reading your paper on the r_l package: A Generalized Extended Kalman Filter Implementation for the Robot Operating System, how much of this still applies to the current incarnation of r_l?

Specifically, are equations 8 and 9 still used (I presume by navsat_transform now) to convert each new GPS coordinate into a pose relative to /map?

r_l transform

r_l inverse transform

Because if they are, and they still use the initial UTM RPY for Phi, Theta and Psi, this would be my guess as to where the problem could have come from, especially if the transform matrix is set with a bad RPY measurement and then never changes. This also wouldn't affect the world-frame to UTM transform, which explains why it doesn't change much between the distorted and normal bagfile.

ayrton04 commented 7 years ago

I don't believe it's an "initial IMU reading" issue, at least not directly (more on that in a moment). You can verify this by following the path of the IMU orientation data through the code.

We receive it here, and place it in the transform_orientation_ variable.
Later, when we need to compute the map->utm transform, we use the transform_orientation_, but we immediately ignore the roll and pitch here, and use the yaw-only orientation here.
The only other place that we use transform_orientation is in removing the GPS offset.

However, the one place where a bad IMU measurement can most certainly cause an issue is in the initial odometry message that we take in. Again, we'll follow the data path for odometry.

We receive it here and store it in the transform_world_pose_ variable.
We use it to compute the map->utm transform here.

This is why I think it's the initial odometry message: if we again refer to this line, when your robot starts, you'll presumably have some non-zero pitch orientation in that initial odometry message. If the UTM coordinates by which we're multiplying that pose are massive - say, on the order of tens of thousands of meters - then a small change in the pitch angle could make a significant difference in the transform. Note that it's still technically an initial IMU issue as you said, in that the EKF is fusing the IMU data with its state estimate, and then passing that on to navsat_transform_node.

One way to test this would be to look at the values that get printed out here, and compare them across your good bag files and the one that's off.

ayrton04 commented 7 years ago

I think I need to revisit whether we should be using roll and pitch data at all in the initial transform generation.