KumarRobotics / imu_vn_100

ROS driver for VN-100 of VectorNav Technologies
Apache License 2.0
46 stars 52 forks source link

Unstable generated frequency #25

Open Triocrossing opened 5 years ago

Triocrossing commented 5 years ago

Hi,

I'd like to thank you first for sharing the tool with us. We tested your driver on a vn-100 under the environment:

Ubuntu 16.04 + ROS Kinetic

Actually we record the ros message published by the driver and visualize the statistics of the bag recorded. One thing we noticed abnormal is that the generated frequency seems unstable along the temporal direction. See figures.

We visualize the data by subtracting consecutive frames, and show the temporal interval along the recording time. One may notice that:

a drop of relative exact 0.05s occurs around every 15~20 frames.

And this strange behavior is independent to the choice of _imu_rate or sync_rate_ in the configuration launch file, in other words, if we set the imu_rate to 100 we still notice this drop in similar amount and period (0.05s and 20 frames).

We've been investigating it for quite a long time but failed to come up with a solution, we'd like to know if you have any good insight about this problem.

Thanks ;-)

Test40_05_step_imu

Test40_06_step_imu

ke-sun commented 5 years ago

@versatran01 @chenste Have you guys met the same problem?

versatran01 commented 5 years ago

@ke-sun Yes, I think I have seen something similar, but I don't recall if it's with microstrain or vectornav.

Triocrossing commented 5 years ago

@ke-sun Yes, I think I have seen something similar, but I don't recall if it's with microstrain or vectornav.

Do you guys have any idea where could it come from? Thx

versatran01 commented 5 years ago

I'm just guessing here, what if you comment out the diagnostic updater?

Triocrossing commented 5 years ago

I'm just guessing here, what if you comment out the diagnostic updater?

I tried to comment out the updater_.Update() in the function PublishData(..) It seems the strange behavior persists in the system. But maybe you provided a very good hint, for me the behavior seems triggered by two independent update system holding different frequencies.

Triocrossing commented 5 years ago

@bigea

versatran01 commented 5 years ago

How about disabling all output except for acc/gyro?

Triocrossing commented 5 years ago

How about disabling all output except for acc/gyro?

Nope, we deactivated them in the config launch file, it generates the same result.

versatran01 commented 5 years ago

Do you think these jumps are timestamped incorrectly (by the driver) or are they output incorrectly by the sensor?

Triocrossing commented 5 years ago

Do you think these jumps are timestamped incorrectly (by the driver) or are they output incorrectly by the sensor?

Sorry for the late reply. Honestly, I'm not quite sure. It seems the data is neither replica nor error data (it keeps somehow continuity wrt others). It can be the timestamp is wrong, cus when we set it at 200Hz, see fig1 before. It seems the "drop-off" generates a infinitely high frequency which is physically less plausible.

versatran01 commented 5 years ago

I wouldn't say it's "infinitely high", judging from the plot here unless you are fixing the axis limits then the drop-offs are a fixed +-0.004s. What I suggested before is that I suspect these imu measurements are correct, in the sense that the sensor outputs some measurements a little bit off the requested frequency, which are picked up by the driver and stamped accordingly. So both the value and the timestamp are correct, it is just the timestamp is not at the time that we expect it to be. Again, this is just my conjecture. I currently don't have a unit with me, but maybe you could try to apply some constant rotation to the unit and look at the values at those "drop-offs", if they look smooth, then I think we are fine. But if not, then it's some other issues.

Triocrossing commented 5 years ago

I wouldn't say it's "infinitely high", judging from the plot here unless you are fixing the axis limits then the drop-offs are a fixed +-0.004s.

Actually, yes it's the axis limits. Like i said before, the "drop-offs" constantly drops 0.005s so if we fix the frequency at 0.005, it drops to a near 0 period. True that I wouldn't say "infinitely" but the frequency is out of the capacity of the unit.

What I suggested before is that I suspect these imu measurements are correct, in the sense that the sensor outputs some measurements a little bit off the requested frequency, which are picked up by the driver and stamped accordingly. So both the value and the timestamp are correct, it is just the timestamp is not at the time that we expect it to be. Again, this is just my conjecture. I currently don't have a unit with me, but maybe you could try to apply some constant rotation to the unit and look at the values at those "drop-offs", if they look smooth, then I think we are fine. But if not, then it's some other issues.

According to some previous observation, the measurements are roughly continue and in-middle of two neighbours. I'll check the generated deltas to see if they are relatively consensus to a constant motion model. (if the real "drop-offs" are truly measured with a very short time after previous measurement, then it would be much more similar to the previous measurement than the following one. Otherwise, it should be similar to an average of the sum of previous and following ones.)

Again, big thanks for the help, I will tell you as soon as I have some results. (might be tomorrow

IanTheEngineer commented 5 years ago

I could reproduce this issue on the master branch, and can verify that https://github.com/KumarRobotics/imu_vn_100/pull/28 fixes most of this issue. This fix is only for the ROS2 dashing branch, but it shouldn't be difficult to cherry-pick/backport to the ROS1 master branch.

Scenario: per published message, the publisher grabs the most recent data and the current system timestamp to stuff into the message Header/stamp before publishing. This is problematic, as the actual data was read at some nebulous time prior to that stamp, and is impacted by the variable amount of time that passes before stamping due to priority of the publisher thread, and whether it is context-switched out. There are two mitigations for this issue: 1) Record the timestamp at the IMU level, at the same time as the data. See this and that in code. 2) Operate the ROS publisher loop in a more soft-real-time manner

The first point is far more important than the second. It allows us to properly filter the data with respect to the age of the data using message_filters, and will get rid of large spike in the timestamp data itself. This will not, however, remedy any jitter experienced by a subscriber relying on the message being received at a constant interval.

The second point can be done with the inclusion of realtime_tools RealTimePublisher or RealTimeBox to publish consistently in a separate thread from data collection. Also, avoiding any memory allocation during this period will help reduce jitter. Honestly, this point is not super important unless you're relying on the IMU data to be received at extremely regular intervals. Most likely the first point will suffice as it is the only point that impacts the published data itself.

The proof is in the plot: timestamp_with_labels

This plot was done by running the node in ROS2 at a baudrate of 921600 and published at 100Hz. The data is centered around 0.01 sec (as expected) with a maximum of +/-0.000004 sec jitter. None of the symptoms described in this ticket are present with the patch applied. Thanks to @clalancette for all the work he put into fixing this (alongside the ROS2 port in general :)

versatran01 commented 5 years ago

@IanTheEngineer Thanks for testing, do you think you can pull the fix from the other branch to master?

IanTheEngineer commented 5 years ago

I'm headed out of town for the weekend, but I will submit a patch early next week. The core change will be #28, converted to ROS1 Time: https://github.com/KumarRobotics/imu_vn_100/pull/28/files

versatran01 commented 5 years ago

I can also verify the fix.

Plot for delta time

Before: Selection_007

After: Selection_008

I will open a PR soon with the fix I pulled in from dashing.

versatran01 commented 5 years ago

There might be one tiny problem with this fix, which is the time stamp will slowly deviate from host time.

This plot shows ros::time::now() - fixed_time in ns Selection_009