How to train CNN learn 2D coordinate information? (like shape_predicter_traier)

ToneTec2019 commented 4 years ago

Hello Everyone. I'm using dlib since a few months, This library has helped me so much with my work and my hobby! .

I'm trying to implement "DeepPose(Human Pose Estimation via CNN) " since few weeks by dlib.

So I looking for the way to train to CNN the two-dimensional coordinate information dataset (※1) using dlib's deep learning API , but i couldn't find it.

Sorry for the layman's question. Please tell me how to train CNN learn 2D coordinate information ( like train to shape_predicter)using dlib's deep learning API?

Any comments will be appreciated... Thanks.

※1 I am using mpii dataset convarted to xml file. (like ↓)

<dataset>
    <name>training pose data</name>
    <comment>These are images from mpii dataset</comment>
    <images>
        <image file="042290663.jpg">
            <box height="720" left="0" top="0" width="1280">
                <part name="r_ankle" x="-1" y="-1" />
                <part name="r_knee" x="-1" y="-1" />
                <part name="r_hip" x="566" y="563" />
                <part name="l_hip" x="528" y="553" />
                <part name="l_knee" x="576" y="710" />
                <part name="l_ankle" x="-1" y="-1" />
                <part name="pelvis" x="547" y="558" />
                <part name="thorax" x="540" y="334" />
                <part name="upper_neck" x="546" y="307" />
                <part name="head_top" x="580" y="161" />
                <part name="r_wrist" x="824" y="410" />
                <part name="r_elbow" x="694" y="433" />
                <part name="r_shoulder" x="586" y="346" />
                <part name="l_shoulder" x="493" y="322" />
                <part name="l_elbow" x="413" y="417" />
                <part name="l_wrist" x="419" y="373" />
            </box>
        </image>
        <image file= .......

arrufat commented 4 years ago

Hi, I have already built networks that do this kind of stuff using dlib. You have basically two options, if you want to train a neural network to output key points:

The first approach will allow you to output the exact coordinates of the keypoints, but in my experience, I never got really good results.

So I recommend you to use the second approach. In order to train the network, you need to generate heatmaps of the coordinates and place them in a std::array<matrix<float>, N>, where N is the number of keypoints (16 in your case).

Then, you train the network to output those heatmaps and, to get the coordinates, you just find the position of the bright points on each heatmap. You can either do that manually, by thresholding, etc or even use find_bright_keypoints.

In #1863, there was an example on how to do this in a test, but it was removed due to complexity: https://github.com/davisking/dlib/pull/1863/commits/1000b30fcad6ac2beaaff5c2a2aad1678bf77683#diff-7d2445557c03b31a758c2ef7d310bc53L2558-L2653.

If I find a simple keypoint dataset, I might contribute an example at some point :)

Reference: Simple Baselines for Human Pose Estimation and Tracking by Bin Xiao, Haiping Wu and Yichen Wei.

ToneTec2019 commented 4 years ago

Thank you!!!!! I'll try it that way.

davisking / dlib

How to train CNN learn 2D coordinate information? (like shape_predicter_traier) #2136