OpenKinect / libfreenect2

Open source drivers for the Kinect for Windows v2 device
2.08k stars 751 forks source link

Investigate temporal synchronization correctness #721

Open ksze opened 8 years ago

ksze commented 8 years ago

Overview Description:

Is there any kind of guarantee about the synchronization among the depth/ir/colour frames when using SyncMultiFrameListener?

I looked at the implementation of bool SyncMultiFrameListener::onNewFrame(Frame::Type type, Frame *frame) and there seems to be no guarantee.

The only thing that the listener seems to guarantee is that all subscribed frame types are present. It doesn't seem to care if one frame type is lagging behind another.

For instance, I modified Protonect so it prints out the rgb->timestamp - depth->timestamp on every frame, and I find that the difference is always 179 or 180 so far, which means 17.9 or 18.0 ms. This is a bit worrying because at 30 fps, the time interval of each frame is only 33.3 ms; 17.9 ms is more than half of that. Do we know how the Kinect device sequences/muxes depth and colour frames? Could we be collecting the depth and colour frames in the wrong order? If I remember correctly, the official Kinect for Windows v2 SDK seems to have some synchronization guarantee when using it's MultiSourceFrameReader. If a depth or colour frame is dropped, the MultiSourceFrameReference will leave it out instead of giving you the last available frame, although I don't know how the official SDK associates the colour frame with the correct depth frame.

I have also looked at the sequence numbers of the frames. There seems to be always a difference between the depth packet's sequence number and the colour packet's sequence number, but the difference is a bit random between initializations of the Kinect device, so I don't think we can reliably use it to associate depth frames with colour frames.

You can also imagine the following sequence of packets, where:

DCdCDCDcDCDcdcdcdcdcdC

With the current libfreenect2 implementation, it seems like the CDCD after the initial DCd would be treated as synchronized frames even though the depth frame is actually lagging behind the colour frame. The mis-ordering could go on for a very long time until a c frame is skipped. The sequence of DcdcdcdcdcdC at the end is also bad because the D and the C are in fact separated by many dropped c and d frames, but they would still be treated as synchronized frames.

Version, Platform, and Hardware Bug Found:

  1. git log -1 --oneline

    83f88b4c09f0b00724ae65785abcd4f3eeb79f93
  2. uname -a

    Linux KNPC0116 4.4.0-36-generic #55~14.04.1-Ubuntu SMP Fri Aug 12 11:49:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  3. lsusb -t

    /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
       |__ Port 2: Dev 10, If 0, Class=Hub, Driver=hub/1p, 5000M
           |__ Port 1: Dev 11, If 0, Class=Vendor Specific Class, Driver=usbfs, 5000M
           |__ Port 1: Dev 11, If 1, Class=Vendor Specific Class, Driver=usbfs, 5000M
           |__ Port 1: Dev 11, If 2, Class=Audio, Driver=snd-usb-audio, 5000M
           |__ Port 1: Dev 11, If 3, Class=Audio, Driver=snd-usb-audio, 5000M
    /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/14p, 480M
       |__ Port 4: Dev 2, If 0, Class=Wireless, Driver=btusb, 12M
       |__ Port 4: Dev 2, If 1, Class=Wireless, Driver=btusb, 12M
       |__ Port 9: Dev 13, If 0, Class=Hub, Driver=hub/1p, 480M
    /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/3p, 480M
       |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/8p, 480M
    /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/3p, 480M
       |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/6p, 480M
  4. lspci -nn

    00:00.0 Host bridge [0600]: Intel Corporation 4th Gen Core Processor DRAM Controller [8086:0c00] (rev 06)
    00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06)
    00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06)
    00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 04)
    00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04)
    00:16.3 Serial controller [0700]: Intel Corporation 8 Series/C220 Series Chipset Family KT Controller [8086:8c3d] (rev 04)
    00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-LM [8086:153a] (rev 04)
    00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 04)
    00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086:8c20] (rev 04)
    00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8086:8c10] (rev d4)
    00:1c.2 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 [8086:8c14] (rev d4)
    00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 04)
    00:1f.0 ISA bridge [0601]: Intel Corporation Q85 Express LPC Controller [8086:8c4c] (rev 04)
    00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c02] (rev 04)
    00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 04)
    06:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 6b)
christiankerl commented 8 years ago

you are right. there is no synchronization guarantee for SyncMultiFrameListener, as we don't (didn't) know how to synchronize the frames.

the "sync" in the class name also refers to the fact that you can receive frames in the calling thread synchronously by calling waitForNewFrame(), and not to any synchronization between color and depth. if you just register a framelistener it will be executed asynchronously on the packet processor's thread. admittedly, this fact might got lost in the documentation process and is not obvious for a library user (@xlz improve documentation!?).

afaik there is no synchronization between the triggers of the color and depth camera, so color and depth frame will always be taken at different times (on which of the 9 raw ir images would you synchronize?). maybe the MultiSourceFrameReader just outputs color/depth frames with the least difference in timestamp and applies a threshold on the maximum difference? maybe there is also another sequence number hidden in the packet data?

however, it is probably a good idea to add a framelistener, which does some kind of temporal synchronization.

ksze commented 8 years ago

A bit more investigation reveals something interesting.

Using another PC of the exact same spec, but running Windows 8.1 Professional 64-bit and using the official SDK, with MultiSourceFrameReader.

The RelativeTime of the ColorFrame seems to always lags 6.25 or 6.375 ms behind the RelativeTime of the DepthFrame, InfraredFrame, BodyFrame, BodyIndexFrame. Meanwhile, the RelativeTime always matches among DepthFrame, InfraredFrame, BodyFrame, and BodyIndexFrame.

I wonder where the official SDK gets that RelativeTime. For one thing, the RelativeTime has 0.1 microsecond resolution, unlike the 0.1 millisecond resolution that libfreenect2 gets from the USB packets. I suspect that the RelativeTime is actually calculated on the PC, not on the sensor.

ksze commented 8 years ago

If we want to implement a listener that is really synchronized, I think we need to understand the change pattern of the packets' timestamp. As a uint32_t, it necessarily overflows every 5 days (2^32 ticks / 10000 ticks-per-second / 60 seconds-per-minute / 60 minutes-per-hour / 24 hours-per-day ~= 4.97 days). However, the value may actually get reset much earlier. There is also a non-zero chance that the behaviour changes from one firmware version to the next - although I think that would be quite stupid and I don't think Microsoft would do that.

Has anybody run tests to check how often the timestamp gets reset?

all-things-liquid commented 7 years ago

Hello everyone :),

Sorry to dig out this topic, but I have a question regarding synchronization too, this time between IR and depth frames. I know they derive from the same sensor, so they are aligned in terms of position (as they say in the MSDN documentation, "Note that the infrared frame is derived from the same sensor as depth, so the images are perfectly aligned. For example, the infrared pixel at row 5 col 9 goes with the depth pixel at row 5 col 9").

As it is said above too, "The RelativeTime of the ColorFrame seems to always lags 6.25 or 6.375 ms behind the RelativeTime of the DepthFrame, InfraredFrame, BodyFrame, BodyIndexFrame. Meanwhile, the RelativeTime always matches among DepthFrame, InfraredFrame, BodyFrame, and BodyIndexFrame."

Do we have the same result with libfreenect2? I'm probably going to use this library instead of the Microsoft SDK because of multiplatform/OS compatibility issues. I'm looking all over the Internet but can't seem to find an answer, maybe because it is too obvious. But I'm still asking, just to make sure.

I'm definitely a noob to Kinect development so sorry if my question is stupid... but I thought it could be possible that for some reason the computation of the depth map could last longer than 1/30 second, so if we still have a sensor firing images at 30 FPS we could find ourselves with one data stream lagging behind the other. (Come to think about it, is the depth frame calculated somehow or is it just raw data from the time of flight sensor?)

Thank you in advance for your answer!

xlz commented 7 years ago

@LiquidStorm

Note that the infrared frame is derived from the same sensor as depth, so the images are perfectly aligned

We do have this guarantee. If one frame takes longer than 1/30s to compute there would be data loss instead of time delay between depth/IR. Imagine this: depth and IR are two sides of one coin, they are computed simultaneously.

RelativeTime of the ColorFrame seems to always lags 6.25 or 6.375 ms

I don't think it's even correct for the official docs to say this. The color camera has auto exposure so it's timing is always variable. There is no guarantee on a hardware level, so no for the MS SDK and no for ours. And there is also no guarantee on the USB level because the color frames are transmitted without realtime guarantee.

If you want just a comparison of MS SDK with this, I can say our library provide the same performance with better compatibility.

xlz commented 7 years ago

Action items

ksze commented 7 years ago

While investigating the behaviour of device timestamps and sequence numbers, we need to bear in mind that under low light conditions:

  1. the exposure time of the RGB frame may dramatically change, which may affect the timestamp;
  2. the RGB frame rate may drop to 15 FPS; I think depth/infrared frame rate is not affected;

In that case, it may be necessary to also look at other metadata on the RGB frame (e.g. exposure time and gamma) to decide how to associate with depth frames.

In particular, I'm worried about the following sequence of events:

If this happens, is colour frame 1 supposed to be associated with depth frame 1 or with depth frame 2?

xlz commented 7 years ago

I feel like it's not our job to decide whether to drop data just because of intricate detail in timing mechanism. This is policy and policy should be controlled by the user. If the policy is implemented in this library it necessitates a large buffer to examine temporal alignment which increases the delay before the user can get the data.

If the user cares enough about time correctness they should create two listeners for depth and rgb separately and decide whether it's aligned or not.

The things that are needed in this library are better timestamping and info about latency.

timprepscius commented 7 years ago

Is there any more news about the timestamps in the kinect? I am wondering whether or not the RelativeTime in the official SDK is measured on device or in the computer.

(or both)

Here is what I'm seeing from the Windows SDK. I record the RelativeTime from each frame. I calculate delta time by subtracting the last frame from this frame.

I allocate an array with the last 150 of these delta time and perform mean/stddev. I then watch this window frame of 150 through time.

Using just recording IR and RelativeTime from the IR frame.

The mean is stable around 33333x. The stddev is around 4000. These units are in kinect RelativeTime units, so 10000 ticks in a millisecond, 1000 milliseconds in a second.

So, I believe, I am seeing that the kinect relative time stamp of the IR sensor varies from an absolute 1/30 of a second by 400 microseconds.

This variance is independent of possible processing going on. (The kinect, with the microsoft SDK, slows down and speeds up with ir frame delivery depending on the nearness of objects in a scene)

--

Recording just Color and using the RelativeTime from the Color Frame.

The color camera is less stable. It ranges from 333200 to 333333. stddev ranges from 5000 - 6200.

The color camera does switch to 666666 when in low light.


Do you have any information about the timestamps in the packets? I wonder if they follow a similar pattern?

In the comments above, it says there are 9 raw ir packets. Does this mean for each 1/30 of a second, there are 9 IR frames delivered to the client?

Does libfreenect use all 9 to create a depth frame?

Do you get a timestamp with each sub frame?

Do you think it might be possible to use 4 sub ir frames from one packet and 5 from another? I am looking to do very precise synchronization between multiple kinects.

Thanks in advance for any answers to my never ending question list.

xlz commented 7 years ago

I have no idea what your measurements mean on a first look.

Do you have any information about the timestamps in the packets

https://github.com/OpenKinect/libfreenect2/issues/849

Does this mean for each 1/30 of a second, there are 9 IR frames delivered to the client?

https://github.com/OpenKinect/libfreenect2/wiki/Linux-USB-Notes#design-of-alternative-transfer-pool

Does libfreenect use all 9 to create a depth frame?

True.

Do you get a timestamp with each sub frame?

Unnecessary. Isochronous transfers are isochronous.

Do you think it might be possible to use 4 sub ir frames from one packet and 5 from another?

No.

timprepscius commented 7 years ago

I added code for timestamps. I am seeing interesting and good data.

The depth received timestamp is the timestamp of the first whole usb packet received. the k-timestamp is the timestamp which comes with the packet.

At first glance, the delta "received_timestamp" is aligning nicely with the delta kinect-timestamp. Which is great.

depth received timestamp 1497012622805607 dt 64004 k-timestamp 56060 k-dt 533 depth received timestamp 1497012622853943 dt 48336 k-timestamp 56327 k-dt 267 depth received timestamp 1497012622885779 dt 31836 k-timestamp 56594 k-dt 267 depth received timestamp 1497012622949639 dt 63860 k-timestamp 57127 k-dt 533 depth received timestamp 1497012622981634 dt 31995 k-timestamp 57394 k-dt 267 depth received timestamp 1497012623013618 dt 31984 k-timestamp 57660 k-dt 266 depth received timestamp 1497012623045644 dt 32026 k-timestamp 57927 k-dt 267 depth received timestamp 1497012623077612 dt 31968 k-timestamp 58194 k-dt 267 depth received timestamp 1497012623109594 dt 31982 k-timestamp 58460 k-dt 266

I will attach the diff, used to create these time stamps. Things which I will probably change for myself are changing the clock to a std chrono high resolution clock.

If you look over diffs and notice any obvious problems with how I am doing this, please let me know. However, at this point, I believe it is working. packet_timestamps.txt