PlayerID and depth map raw data

mjsML commented 4 years ago

I was working with the depth data from the python wrapper, found the returned values to be way off.

It sort of contradicts those Microsoft claims of "1 cm" accuracy (by a lot not your average SNR).

In the python wrappers there is a file that converts the depth data into a "pretty" format.

While investigating I found #588 , Neither @imadr nor myself saw the logic behind this and found it odd.

I dug deeper into the depth format and I believe I found the answer.

What I understand about the basic architecture of Kinect / Primesense tech that it's not just a sensor array, in fact there is a marvel SoC(Marvell AP102)[1] that runs a multitude of algorithms to simply development (like skeltal skeletonization, facial recognition ... etc).

Skeletonization, if initiated the SoC would perform a looped process of identification/classifying and annotate /predict each pixel with the PlayerID.

The high 13 bits of the 16-bit value give you the measured distance in millimeters. The first three bits code for the player that the device has identified, but this is only activated if skeleton tracking is enabled. If you don't enable skeleton tracking, the three bits are set to zero. [2]

I tried to shift by 3 instead of 2 in the wrapper and the values made much more sense after.

My first thought was I would fix the python wrapper/examples, however this would need to be done across all the wrappers and sort of hacky (I'm not interested in learning AS3, thanks. ).

I'm not sure if there are plans to implement skeleton tracking or not, either way the "raw depth" need to be shifted by 3 at the driver / core lib level and pass in the correct format (Ushort) to wrappers the depth information and maybe implement a get_PlayerID function that returns the correct 3 bits (won't be useful till the tracking is actually enabled).

@marcan @JoshBlake thoughts? I'm happy to fix/PR if you agree. Are there other areas of the package that this change would break that I should be aware of?

mjsML commented 4 years ago

adding @zarvox since he is the maintainer of C api as per the wiki ...

piedar commented 4 years ago

Does this affect all of the depth modes? I'm thinking of backwards compatibility, like if some application has a hardcoded threshold like if (depth[x][y] < 200) { /* object is close */ } then that check might break against the new correct values.

mjsML commented 4 years ago

Thanks @piedar for the input. Yes it does unfortunately ... according to the documentation of Microsoft's SDK the high 13 bits are the real depth data, with the 3 remaining being the PlayerID[0-7 value]. For your example it would not break, but the values would be 1/8 of what you expect with the buggy implementation all across (the 200 will be a 25 basically if you don't shift at all or 100 if you shifted right by 2 bits as like the python wrapper).

The visual artifacts from #588 indicates that there are parts that take the correct max value in consideration and then goes to "overflow" causing those visual artifacts(because the values are inflated by 3 bits on the right or value=correctValue*8 sort of speak ).

Maybe we create a new function then, something like getRealDepth() and keep the old one as is for compatibility reasons for couple of releases and mark it as deprecated with warning/note/debug message about the bug?

piedar commented 4 years ago

Alright, here are a couple wild ideas and I hope someone will tell me if they're too wild.

First option option is to accept an environment variable called FREENECT_DEPTH_OVERFLOW so you can change the behavior at runtime. The logic would be something like

short fn_should_overflow() {
  char* envOverflow = getenv("FREENECT_DEPTH_OVERFLOW");
  if (envOverflow == NULL) {
    /* todo: change this to 0 after a few releases */
    FN_WARN("Providing imprecise depth values for legacy compatibility");
    return 1;
  }
  if (strcmp(envOverflow, "1") == 0
   || stricmp(envOverflow, "true") == 0) {
    FN_DEBUG("User chose to use imprecise depth values");
    return 1;
  }
  return 0;
}

Another option is a bit more automatic but would complicate the API.

Right now, these are the available depth formats

typedef enum {
    FREENECT_DEPTH_11BIT        = 0, /**< 11 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_10BIT        = 1, /**< 10 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_11BIT_PACKED = 2, /**< 11 bit packed depth information */
    FREENECT_DEPTH_10BIT_PACKED = 3, /**< 10 bit packed depth information */
    FREENECT_DEPTH_REGISTERED   = 4, /**< processed depth data in mm, aligned to 640x480 RGB */
    FREENECT_DEPTH_MM           = 5, /**< depth to each pixel in mm, but left unaligned to RGB image */
    FREENECT_DEPTH_DUMMY        = 2147483647, /**< Dummy value to force enum to be 32 bits wide */
} freenect_depth_format;

What if we added new formats to replace the existing ones by name but leave them intact by value? So then we'd have

typedef enum {
    /* these old formats are imprecise and may overflow to produce unexpectedly small values */
    FREENECT_DEPTH_11BIT_OVERFLOW        = 0, /**< 11 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_10BIT_OVERFLOW        = 1, /**< 10 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_11BIT_PACKED_OVERFLOW = 2, /**< 11 bit packed depth information */
    FREENECT_DEPTH_10BIT_PACKED_OVERFLOW = 3, /**< 10 bit packed depth information */
    FREENECT_DEPTH_REGISTERED_OVERFLOW   = 4, /**< processed depth data in mm, aligned to 640x480 RGB */
    FREENECT_DEPTH_MM_OVERFLOW           = 5, /**< depth to each pixel in mm, but left unaligned to RGB image */

    /* these new formats are more precise and recommended for new applications */
    FREENECT_DEPTH_11BIT        = 6, /**< 11 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_10BIT        = 7, /**< 10 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_11BIT_PACKED = 8, /**< 11 bit packed depth information */
    FREENECT_DEPTH_10BIT_PACKED = 9, /**< 10 bit packed depth information */
    FREENECT_DEPTH_REGISTERED   = 10, /**< processed depth data in mm, aligned to 640x480 RGB */
    FREENECT_DEPTH_MM           = 11, /**< depth to each pixel in mm, but left unaligned to RGB image */

    FREENECT_DEPTH_DUMMY        = 2147483647, /**< Dummy value to force enum to be 32 bits wide */
} freenect_depth_format;

Then existing binaries compiled against libfreenect will continue to use the old behavior, while freshly compiled source code will use the new correct behavior. Though I don't know if this would work perfectly across all the language wrappers.

mjsML commented 4 years ago

I'm more inclined to the first one (for now) with the second one later (Maybe 0.6?)with a twist.

The first one would be like a "instant fix" that we would merge to the master as soon as it's done .. the second one would be an API clean up sort of thing and we can release it at the next release with a clear not in the readme. my twist for the second one:

typedef enum {
    /* these old formats are imprecise and may overflow to produce unexpectedly large values */
    //FREENECT_DEPTH_11BIT       = 0, /**< 11 bit depth information in one uint16_t/pixel */
    //FREENECT_DEPTH_10BIT        = 1, /**< 10 bit depth information in one uint16_t/pixel */
    //FREENECT_DEPTH_11BIT_PACKED = 2, /**< 11 bit packed depth information */
    //FREENECT_DEPTH_10BIT_PACKED = 3, /**< 10 bit packed depth information */
    //FREENECT_DEPTH_REGISTERED  = 4, /**< processed depth data in mm, aligned to 640x480 RGB */
    //FREENECT_DEPTH_MM_OVERFLOW           = 5, /**< depth to each pixel in mm, but left unaligned to RGB image */

/*This code is left commented here for compatibility reasons, values 0-5 would give you the entire 2 byte value without clipping the PlayerID, values 0-5 are kept to make sure old linked binaries keep working as expected */

    /* these new formats are more precise and recommended for new applications */
    FREENECT_DEPTH_11BIT        = 6, /**< 11 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_10BIT        = 7, /**< 10 bit depth information in one uint16_t/pixel */
    FREENECT_DEPTH_11BIT_PACKED = 8, /**< 11 bit packed depth information */
    FREENECT_DEPTH_10BIT_PACKED = 9, /**< 10 bit packed depth information */
    FREENECT_DEPTH_REGISTERED   = 10, /**< processed depth data in mm, aligned to 640x480 RGB */
    FREENECT_DEPTH_MM           = 11, /**< depth to each pixel in mm, but left unaligned to RGB image */

    FREENECT_DEPTH_DUMMY        = 2147483647, /**< Dummy value to force enum to be 32 bits wide */
} freenect_depth_format;

And we check for the new values at the core logic of extracting the values off the device, the API would not get complicated while the change doesn't break existing code.

On the topic, Do we know the command to enable the skeleton tracking? (to get those PlayerID values instead of zeros)

piedar commented 4 years ago

That is a little cleaner for the API and those definitions for values 0 - 5 could be copied to an internal header. Anyway I think the environment variable would be a good start for now.

Unfortunately I don't know about the skeleton tracking - this info about the top 3 bits is news to me.

piedar commented 4 years ago

Hmm, I've been playing around with this and I'm not seeing the stripe effect, even when I comment out #depth >>= 2 from frame_convert2.py. To see the effect, I have to shift it in the other direction like depth <<= 2 which is obviously wrong. Do you see the effect when running freenect-glview?

mjsML commented 4 years ago

The lines are a factor of the depth of the elements in front of the sensor. I just debugged the python wrapper for now.

In theory there should be no "pretty" anything right? also the values of depth are estimates in millimeter , so the the minimum value can never be less than 400 and max value can never be more than 8191 [1]

There is another tell that the 3 bits garbage depth data : if you use this:

def get_video():
    array,_ = freenect.sync_get_video(0,freenect.VIDEO_IR_10BIT)
    return array
def pretty_depth(depth):
    np.clip(depth, 0, 2**10-1, depth)
    depth >>=3
    depth=depth.astype(np.uint8)
    return depth
if __name__ == "__main__":
    while 1:
        #get a frame from RGB camera
        frame = get_video()
        #display IR image
        frame = pretty_depth(frame)
        cv2.imshow('IR image',frame)

        # quit program when 'esc' key is pressed
        k = cv2.waitKey(5) & 0xFF
        if k == 27:
            break
    cv2.destroyAllWindows()

You'd get clean data of the IR camera (forget the calculated depth by the SoC even). now try the same code by removing the pretty depth call. I get totally black image.

Pretty does a good job for near objects, however it seems this bug has been researched by academia even[2] Shifting and clipping the values makes everything relatively near look the same and far is cut out. the technical report has this graph to show the issue: Performance graph Asides from the saturation at 1000 (which the clip function is doing), that deviation from the correct values is caused by this issue.

The C api returns the bare 16-bits or packs into less bits using the wrong input, and the python Api clips and shifts ... Anyway I'm caught up in a personal thing ATM ... Pull should come next week.

mjsML commented 4 years ago

Also I was going through Primesense's original OpenNI (that is quite a piece of work, Scene graphs, command queues, graph queries ... I think I might have found the skeleton tracking code... Will update soon once I test it.

OpenKinect / libfreenect

PlayerID and depth map raw data #603