HandTracker works bad on other camera data.

lyyiangang commented 6 years ago

hi, I am very excited about your project, It seem your method works well on hand pose recognization. HandTracker works well in your "synthetic-hand-tracker-vs2015", but it works bad on my data. I download some MSRA hand data. Their camera sensor size 320*240 and the focal length is 241.42. the center is (160, 120). I make a small modification on "realtime-hand-tracker-vs2015" and make it support the recognization of a single frame. msra-hand.txt is a single frame depth data. Here is some code:

std::string filename = "msra-hand.txt";
DCamera dcam({ 320,240 }, { 241.42f,241.42f }, { 160.0f,120.0f }, 0.001f);
auto dimage= LoadFrame(dcam, filename);

here is LoadFrame:

Image<unsigned short> LoadFrame(const DCamera& dCam, const std::string& filename)
{
    std::ifstream iStream(filename);
    unsigned short backgournd = 3;
    Image<unsigned short> frame(dCam);
    for (int yy = 0; yy < dCam.dim().y; ++yy)
    {
        for (int xx = 0; xx < dCam.dim().x; ++xx)
        {
            int depth = 0;
            iStream >> depth;
            frame.pixel({ xx,yy }) = depth;
        }
    }
    return frame;
}

then use htk.update(std::move(dimage)) to recognize hand pose. Unfortunately, I get a wrong result like this. I also try some other gestures, but I don't get good results. Then how could I make HandTracker works better on MSRA's hand dataset ? Do I miss something important??

Thanks

melax commented 6 years ago

a few things to mention:

the depth image passed into the hand tracking viewer appears white in the background instead of black as typically seen in the synthetic viewer. Note that some (but not all) depth cameras use a pixel value of 0 to indicate depth is unknown at that pixel, rather than meaning depth is 0. The system here wasn't programmed to accommodate that case. For segmentation purposes, the system assumes such pixels are foreground instead of background. In other words, it expects background pixels to be 'distant', so any depth==0 pixels could probably just be pushed back to a safe distance beyond a meter or so. Since the depth buffer uses unsigned short data type in mm units perhaps use a value of a few thousand.
there's no sophistication to the oversimplified hand location/segmentation part of the pipeline. the system expects one and only one hand in the scene and that it can easily detect the entry point of the wrist into the image by scanning the border. It uses this entry point and the average of the non-background pixels to find the alignment of the forearm so that it can rotate, crop, and resize the image so that the CNN will get a 64x64 input upright image of a single hand. It looks like the sample image in the picture you referenced may not the same sort of obvious entry point.
the CNN was trained on data from a head-mounted Intel RealSense SR300 camera using only about 100K frames of the right hand from some middle-aged 150 pound guy with fingers designed to navigate the frets of a guitar. So performance may not be as good from any other user, or any other camera, or any other camera placement relative to the hand.

lyyiangang commented 6 years ago

Thanks very much for your reply. It works now for my data. Two things I have done following your tips.

Change the background depth value to 1000 millimeter.
The code can't find entry point since the lost of arm depth info. After adding a virtual arm ( I add some depth value in the arm region manually), I get the inferred results finally. we could close this issue now. Thanks very much @melax .

IntelRealSense / hand_tracking_samples

HandTracker works bad on other camera data. #7