OpenKinect / libfreenect2

Open source drivers for the Kinect for Windows v2 device
2.08k stars 750 forks source link

What function does the x_table serve? #1041

Closed willem0 closed 5 years ago

willem0 commented 5 years ago

Overview Description:

Regarding the corrections to depth used here: https://github.com/OpenKinect/libfreenect2/blob/master/src/cpu_depth_packet_processor.cpp#L704

float max_depth = phase * params.unambigious_dist * 2; xmultiplier = (xmultiplier * 90) / (max_depth * max_depth * 8192.0); float depth_fit = depth_linear / (-depth_linear * xmultiplier + 1);

But that still leaves a head-scratching multiplicative term of: 1/(1-xmdepth_linear) that I don't understand, where: xm = (x_table/8192)90/(max_depth^2) max_depth = phase params.unambigious_dist 2

Can anyone explain this multiplicative term? It's small (usually between .99 and 1.01) but not quite small enough to ignore. It sort of looks like a planar homography, maybe due to the projector and receiver not being co-located? It looks like the 90 term could be 3*lcm(3,15,2), relating to the 3 frequencies.

I'm not complaining about the code readability or magic numbers or anything, I just want to understand. Thanks.

Also, is my understanding correct that running Protonect with a pipeline (like ./Protonect cpu) computes an independent depth estimate while running just ./Protonect simply passes along the depth map computed internally by the Kinect itself? Is the point of the cpu pipeline to provide an independent estimate from the raw voltage values found in the depth packet?

xlz commented 5 years ago

See https://github.com/OpenKinect/libfreenect2/issues/144#issuecomment-136962746.

xlz commented 5 years ago

8192 is the scale of the fixed point numbers. The ToF principle uses phases so it has periods, i.e. it repeats. So unambiguous distance is the interval where phase measurements are not repeated yet. From phase measurements you get the total travel distance from the light source to the focal point reflected off the object. Then you correct the total travel distance to the distance from the object to the sensor. But that corrected distance is still not the Z coordinate we want. You further correct it with some geometry.

willem0 commented 5 years ago

Thanks Lingzhu,

That thread is very informative, and has the answer I was looking for. (I had suspected the correction term was due to non-co-located light source, as above.) Great detective work on the tables.

I'm not able to make the connection between the line d 1/z = and the following line z =. Can you give me a hint or propose a substitution that would help make this connection?

The genesis of cpu_depth_packet_processor.cpp and the corresponding depth_linear / depth_fit correction was traced to this commit. So it's entirely possible that when trying to match the SDK the L^2 term was simply left off (it produces less than 1% of error for distances larger than 0.5m, so easy to ignore). But it seems it should go in now, right? (Better late than never.)

So unambiguous distance is the interval where phase measurements are not repeated yet.

Not so, as pointed out earlier, none of the wavelengths of 3750, 18750, 2500 mm match this so-called unambiguous distance of 6250/3 mm, nor does their least common multiple, 37500 mm, the maximum (round-trip) distance that can be resolved with phase unwrapping. As you then pointed out, it's just a multiplier to make the model fit.

But I have a guess for this quantity. If you have 3 independent estimates d0, d1, d2 of a scalar d, the sensible way to fuse them is to take their average (d0+d1+d2)/3. But if some of these are more reliable than others, it makes sense to do a weighted average. In this case, it seems that the weights the MS SDK uses are 1/3, 1/15, and 1/2 corresponding to the ratios of the 3 frequencies. These sum to 0.9 instead of 1.0, so it's necessary to normalize.

Here, the three indepdendent distance estimates are d0 = c2 ? t5 + 15.0f : t5; d1 = c2 ? t1 + 15.0f : t1; d2 = std::floor((-t2 + t6) * 0.5f + 0.5f) * 2.0f + t2; using the notation from processPixelStage2, where the units of these estimates are in terms of the greatest common divisor wavelength, 1250 mm. The resulting (one-way) distance estimate is (d0/3+d1/15+d2/2)/.91250/2 mm, which matches the jury-rigged estimate found in the code: (d0/3+d1/15+d2/2) / 3 2083.3333.

However, the maximum likelihood estimate should not use 1/3, 1/15, and 1/2 as the relative weights, but the square of these terms. Lawin et al. seem to recognize as much (see Eq.(10) of their paper), but they nevertheless have evidently deferred to precedent and kept the same weights in their kernel-density-based implementation.

Cheers, William

[edit: round-trip vs. one-way discrepancies]

xlz commented 5 years ago

codecogseqn

I don't really know what the unambiguous distance does. That was a guess. That KDE paper is more informative on this than me.

xlz commented 5 years ago

You should look at the OpenGL shaders. They are directly ripped from Microsoft binaries thus canonical. The rest is all replication of the OpenGL shaders. I'm keeping L^2 out so the result is consistent even if it's wrong.

willem0 commented 5 years ago

The derivation makes sense now, thanks.

ripped from Microsoft binaries thus canonical

The names of the variables or just the values? The values make sense to me for reasons mentioned above, but the names seem wrong. The unambiguous distance is the LCM of the 3 wavelenths, 18.75 meters (round trip), as also pointed out the KDE paper, and has nothing to do with the 2083.33 figure. The latter is a hodgepodge of floating constants to make the weighted average work.

floe commented 5 years ago

Just so I don't forget, @saulthu recently pointed me to Microsoft's NuiSensorLib which contains some additional bits and pieces of information about internal Kinect v2 data structures. I'm planning to go through that file before the holidays and update the libfreenect2 headers where appropiate.