Fusing Kinect based 3D pose estimation (visual odometry) and IMU

Here I am trying to employ ethzasl_sensor_fusion package to fuse IMU with a visual odometry method on Kinect. Specifically I am using the ccny_rgbd ROS package to estimate the 3D pose of Kinect and a xsens IMU with ethzasl_xsens_driver.

Q1. One observation is that visual odometry always (from my observations) under-estimates the actual motion, for example, if the Kinect is moved for 1 metre, then the visual odometry (VO) method reports only 85-90 cm. I think that IMU always over-estimates the motion, by reporting more than the traversed distance. So I assume that fusing VO and IMU can give better results and am heading in this direction. Do you think this makes sense ?

My Kinect + IMU configuration is shown below 2014-04-02-155828

In this case, the coordinate frames of IMU and Kinect (the pose frame from VO) align with each other and thus the rotation is the same as the default one. I just added a rough translation between the devices.

I tried moving Kinect + IMU from the forward looking direction, Q2.1 Initial values of the filter estimates are not good, i.e. if the VO says 0.01 m in x, then filtered output says 0.06 m in x. Q2.2 After certain random motion, the EKF is able to give approximate estimates but they are not as good as VO. For example, actual translation of 1 m, is reported as 80-82 cm by filter and 85-90 cm by VO. The filtered value is still underestimating the VO value which is not accurate (under estimate) in first place. Could you point me some directions to explore ?

Q3. What kind of excitation or movement is required for the above setup for the filter convergence ?

Q4. What is the importance of the covariance values from kinect's pose and IMU ? Right now I am giving covariance values for pose from reconfigure_gui. (0.01 for both meas1 and meas2)

Q5. Most importantly the pose update from VO is around 8 Hz. Does this effect the accuracy of the filter ? What can I do to cope up with this ? Should I use a bag file and run the algorithm slowly or any other suggestions ?

Thank you for your time and effort,

Good day! Sai Manoj

Hi Sai Manoj,

Thanks for using our framework. to your questions:

Q1: this makes sense. Bear in mind that ethzasl_sensor_fusion uses absolute poses. That is, you will most probably not be any better in position and yaw drift than the VO. The framework will, however, correct the VO drift in roll and pitch (i.e. your navigation frame will be gravity aligned) and yield state estimates at IMU rate rather than at (the usually slow) VO rate.

Q2: make sure all your frames are set correctly: if you place the IMU horizontally it should measure [0,0,+g], that is, z-axis should point upward. If it measures -g, you'll have to correct g in the mesurement.h file. For a front looking setup, you'll have to adapt q_vw in the measurement.h file as well. This is the rotation between the kinect-map and a gravity aligned navigation frame. E.g. if the coordinate system ini which the Kinect is perceived has its z-axis forward, x to the left, and y up then q_vw can be (w,x,y,z)=(sqrt(2)/2,-sqrt(2)/2,0,0) - that is negative 90° turn around x-axis. You can chose the yaw arbitrarily since it is unobservable.

Q3: The Kinect gives you metric measurements. So you may want to fix the scale in the filter to 1 and you only need motion to make the kinect-IMU extrinsics converge. These states best converge with rotational motion around the IMU center.

Q4: the covariance values are important to have a consistent filter. However, if the filter does not behave like you expect it, "wrong" covariance values are rarely the issue. Mostly some transformation is wrong in the initialization.

Q5: 8Hz with the IMU you have should be quite ok. We flew a quadcopter with updates as low as 1Hz with a very bad IMU and it still was ok. Just play the bag file at regular speed, the filter takes anyway the timestamps in the message, not the time of message arrival.

Hope that helps. .

Best, Stephan

From: Prakhya Sai Manoj [notifications@github.com] Sent: Thursday, April 03, 2014 1:53 AM To: ethz-asl/ethzasl_sensor_fusion Subject: [ethzasl_sensor_fusion] Fusing Kinect based 3D pose estimation and IMU using your package ( doubts in transformation, forward looking mode etc) (#23)

My Kinect + IMU configuration is shown below [2014-04-02-155828]https://cloud.githubusercontent.com/assets/4462455/2600956/f2a2f6fc-bb08-11e3-8a6a-fc1b1d2575f4.jpg

Q3. What kind of excitation or movement is required for the above setup for the filter convergence ?

Q4. What is the importance of the covariance values from kinect's pose and IMU ? Right now I am giving covariance values for pose from reconfigure_gui. (0.01 for both meas1 and meas2)

Thank you for your time and effort,

Good day! Sai Manoj

— Reply to this email directly or view it on GitHubhttps://github.com/ethz-asl/ethzasl_sensor_fusion/issues/23.

ethz-asl / ethzasl_sensor_fusion

Fusing Kinect based 3D pose estimation (visual odometry) and IMU #23