Daniil-Osokin / lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU.
Apache License 2.0
656 stars 138 forks source link

Will it run on mobile? #26

Closed fm64hylian closed 4 years ago

fm64hylian commented 4 years ago

Hi, I am trying to develop for Unity android and have tested a lot of body pose models (single person is enough) using opencv for unity. The goal is to attach a 3D humanoid avatar to the joint, however most model only provide the 2D points so this project could be the solution.

I am trying to deploy the model and convert it to the ONNX file so I can read it from Unity, but I am having trouble trying to run convert_to_onnx.py on windows 10, I tried using the windows python version and could not install the packages from requirements.txt, so I tried in a new environment using anaconda. Finally, I ended up installing all packages using conda install, and when executing python setup.py build_ext, I get the following error:

running build_ext
-- Building for: NMake Makefiles
CMake Error at CMakeLists.txt:2 (project):
  Generator
    NMake Makefiles
  does not support platform specification, but platform
    x64
  was specified.

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage

I have android studio and visual studio installed as well which contain different versions of cmake but I don`t really know how should I configure that, so instead is the ONNX model available somewhere else? if it does, is it possible to get the input and output names so get the information from the joints?

Those where like 3 questions in one but if you want to, I can open different issues to discuss:

1.- Will it run on a device? during compilation a "cuda required" message appeared so maybe not, unless openvino has some magical way to make it run on CPU only.

2.- issues during installation and setup, such as the requirement.txt not working as expected

3.- ONNX model relevant output names to get 3D points.

If there is any point that need further explanation, please let me know.

EDIT: I managed to run it on a different PC but now when execute the line python scripts/convert_to_onnx.py --checkpoint-path human-pose-estimation-3d.pth

I get this error:

Traceback (most recent call last):
File "scripts/convert_to_onnx.py", line 5 in <module>
  from models.with_mobilenet import PoseEstimationWithoutMobileNet
ModuleNotFoundError: No module name 'models'

I added the same PYTHONPATH as well

Thank you

Daniil-Osokin commented 4 years ago

Hi! You can run python demo as is, without running setup.py. Running setup.py builds some performance critical C++ part of code, which is used instead of default python version. If you want to run it on a device, I believe you need to adopt current code for Unity - rewrite to C# or make a C++ library. The first error you have posted tells, that C++ compiler was not found, so you need to install it. The second posted error tells that there is no module models, so possibly models folder is missed in your location.

fm64hylian commented 4 years ago

Thank you for your reply. In that case, should I move convert_to_onnx.py from the scripts folder to the root folder?

I want to read the input channels from demo.py so I can pass that code to C# using the OpenCV for unity plugin, which is just a C# wrapper for OpenCV. I am not an expert in python, but I was checking parse_pose.py and checked these values:

map_id_to_panoptic = [1, 0, 9, 10, 11, 3, 4, 5, 12, 13, 14, 6, 7, 8, 15, 16, 17, 18]

limbs = [[18, 17, 1],
         [16, 15, 1],
         [5, 4, 3],
         [8, 7, 6],
         [11, 10, 9],
         [14, 13, 12]]

I just want to make sure that they are the same COCO indexes and you just used them in different order, for COCO values is

          [  { "Nose", 0 }, { "Neck", 1 }, { "RShoulder", 2 }, { "RElbow", 3 }, {"RWrist",4},
            { "LShoulder",5 }, { "LElbow", 6 }, { "LWrist", 7 }, { "RHip", 8 }, {"RKnee",9},
            { "RAnkle", 10 }, { "LHip", 11 }, { "LKnee", 12 }, { "LAnkle", 13 }, {"REye",14},
            { "LEye", 15 }, { "REar", 16 }, { "LEar", 17 }, {"Background",18}]

So the question is, when you are reading the 3D points from the model, are they already included in the output? could you please tell me which position do they occupy in the output values? Just to give an example, this is how I am reading the COCO model in my C# code (based on this project https://github.com/faem/OpenPose-Unity ) where according to COCO documentation, the first 19 channels are body parts, and the other 38 are PAF:

    Dictionary<string, int> BODY_PARTS = new Dictionary<string, int>() {
            { "Nose", 0 }, { "Neck", 1 }, { "RShoulder", 2 }, { "RElbow", 3 }, {"RWrist",4},
            { "LShoulder",5 }, { "LElbow", 6 }, { "LWrist", 7 }, { "RHip", 8 }, {"RKnee",9},
            { "RAnkle", 10 }, { "LHip", 11 }, { "LKnee", 12 }, { "LAnkle", 13 }, {"Background",18}
       // , {"REye",14}, { "LEye", 15 }, { "REar", 16 }, { "LEar", 17 }, 
        };

    string[,] POSE_PAIRS = new string[,] {
            { "Neck", "RShoulder" }, { "Neck", "LShoulder" }, {"RShoulder","RElbow"},
            { "RElbow", "RWrist" }, { "LShoulder", "LElbow" }, {"LElbow","LWrist"},
            { "Neck", "RHip" }, { "RHip", "RKnee" }, { "RKnee", "RAnkle" }, {"Neck","LHip"},
            { "LHip", "LKnee" }, { "LKnee", "LAnkle" }, { "Neck", "Nose" },
            //{"Nose","REye"}, { "REye", "REar" }, { "Nose", "LEye" }, { "LEye", "LEar" }
    };

net = Dnn.readNetFromTensorflow(graph_filepath);

//...
        Mat rgbaMat = webCamTextureToMatHelper.GetMat();
        Imgproc.cvtColor(rgbaMat, rgbaMat, Imgproc.COLOR_RGBA2BGR);
        gameObject.transform.localScale = new Vector3(rgbaMat.width(), rgbaMat.height(), 1);

        if (net == null)
        {
            return;
        }

        float frameWidth = rgbaMat.cols();
        float frameHeight = rgbaMat.rows();
        Mat input = Dnn.blobFromImage(rgbaMat, 1.0, new Size(inWidth, inHeight), new Scalar(0, 0, 0), false, false);
        net.setInput(input, "image");
        //forward() gives a 4x4 matrix (dim1 image id, dim2 keypoint index, dim3 height, dim4 width)
        Mat output = net.forward("Openpose/concat_stage7");

        //COCO model consists of 57 channels – 18 keypoint confidence Maps + 1 background + 19 * 2 Part Affinity Maps

        //Changes the shape and/or the number of channels of a 2D matrix without copying the data.
        output = output.reshape(1, 57); // reshapes 1 channel from 57 rows

        List<Point> bodyPoints2D = new List<Point>();

        //the first 18 rows contain the body keypoint, we will be using 13 only (face parts removed, check indexes)
        //for (int i = 0; i < BODY_PARTS.Count; i++)
        foreach (var kvp in BODY_PARTS)
        {
            int channelIndex = kvp.Value;
            //Probability map of corresponding body part.
            Mat heatMap = output.row(channelIndex).reshape(1, 46);
            //checking if the keypoint is effective (confidence > treshold)
            Core.MinMaxLocResult result = Core.minMaxLoc(heatMap);
            heatMap.Dispose();

            double x = (frameWidth * result.maxLoc.x) / 46;
            double y = (frameHeight * result.maxLoc.y) / 46;

            //add to the detected points if it passes the treshold
            bodyPoints2D.Add(result.maxVal > 0.3 ? new Point(x, y) : null);
        }

I also noted that you are using an average height for humans, since our goal is to measure the persons measurements accordingly, do you think its possible to calculate an approximate value from the nose to the ankles using the points? does the code apply any camera calibration to get the right values depending on distance to the camera?

I am sorry to make such newbie questions but I just started with deep learning just some weeks ago. Please let me know if there is anything not clearly explained.

UPDATE: I moved the model and convert_to_onnx.py to the root and despite this warning:

[WARNING] Not fount pre-trained parameters for fake_conv_heatmaps.weight
[WARNING] Not fount pre-trained parameters for fake_conv_pafs.weight

the ONNX was generated, OpenCV for unity provides a Dnn.readNetFromONNX() method so what is left is to read the model and get the correct inputs!

Thank you.

Daniil-Osokin commented 4 years ago

For the used keypoint order, I suggest to check #15. Function parse_poses returns poses in 3D keypoints order, so you may use it. AVG_PERSON_HEIGHT is just a normalization constant.

fm64hylian commented 4 years ago

I see, other issues were not being displayed so I could not check, I'll let you know how the implementation to c# goes.

Thank you

Daniil-Osokin commented 4 years ago

Hope, it helped.