Python API and GPU Issue

skcskc7 commented 6 years ago

Issue Summary

I am using open pose python API. Does the Python API support multi-gpu? And is it possible to do pose estimation with video instead of image? How many total multi gpu do you support?

Executed Command (if any)

Note: add --logging_level 0 --disable_multi_thread to get higher debug information.

OpenPose Output (if any)

Type of Issue

You might select multiple topics, delete the rest:

Question

Your System Configuration

OpenPose version: Latest GitHub code? Or specific commit (e.g., d52878f)? Or specific version from Release section (e.g., 1.2.0)?
- Latest
General configuration: Installation mode: CMake Operating system (lsb_release -a in Ubuntu): Ubuntu 14.04.5 LTS Release or Debug mode? (by default: release): release Compiler (gcc --version in Ubuntu or VS version in Windows): gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Non-default settings:
- 3-D Reconstruction module added? (by default: no):
- Any other custom CMake configuration with respect to the default version? (by default: no):
3rd-party software:
- Caffe version: Default from OpenPose
- CMake version (cmake --version in Ubuntu): cmake version 3.9.3
- OpenCV version: pre-compiled apt-get install libopencv-dev (only Ubuntu);
If GPU mode issue:
- CUDA version (cat /usr/local/cuda/version.txt in most cases): CUDA Version 8.0.61
- cuDNN version:5.0.1
- GPU model (nvidia-smi in Ubuntu): 1080 Ti
If CPU-only mode issue:
- CPU brand & model:
- Total RAM memory available:
If Python API:
- Python version: 2.7
- Numpy version (python -c "import numpy; print numpy.version.version" in Ubuntu):
If Windows system: z
- Portable demo or compiled library?
If speed performance issue: z
- Report OpenPose timing speed based on this link.

gineshidalgo99 commented 6 years ago

There is no multi-GPU, if you want a much higher speed up, you should consider C++ version that includes it.
For video in Python, use any of the many libraries to read video, e.g. OpenCV.

Best

soulslicer commented 6 years ago

You can use python's multiprocessing library to do this yourself. You can construct one openpose object per gpu per process.

moncio commented 5 years ago

In the next release or in the future, it has been thought to introduce this funcionality for the Python API? Thanks in advance!

gineshidalgo99 commented 5 years ago

Multi-GPU? It should already work.

moncio commented 5 years ago

For Python version? I was testing and comparing with C++ version and the results was totally different.

soulslicer commented 5 years ago

what do you mean they are different?

moncio commented 5 years ago

I was testing the example "05_keypoints_from_images_multi_gpu" both in the Python version than C++ version and the results were different. I changed the example to read a video (example video of media folder) using OpenCV. My computer has 2 graphic card (2x Nvidia GTX-1080), I monitorize the graphics performance using command "watch nvidia-smi" and I observe in both cases, the GPU utils is increased for both cards. But the problem is about the speed and profiling, when I run the example in C++ version I observe the time duration of the video processing is higher (about 7.5 secs) but when I test the Python version, the duration increases coming to the double (about 15 secs). So analizing this behaviour I ask if it's normal or maybe I'm doing something wrong. Thanks!

soulslicer commented 5 years ago

Hmm, I tested it on the image folder example. As I recall that did give the same speed. Maybe you can send me a code example

moncio commented 5 years ago

Of course, that's my Python code:

from openpose import pyopenpose as op

import sys
import cv2
import os
from sys import platform
import argparse
import time

parser = argparse.ArgumentParser()
parser.add_argument("--video", default="examples/media/video.mp4", help="Read input video (avi).")
parser.add_argument("--no_display", default=False, help="Enable to disable the visual display.")
args = parser.parse_known_args()

params = dict()
params["model_folder"] = "MODELS_PATH"
params["disable_multi_thread"] = "false"
numberGPUs = params["num_gpu"] if "num_gpu" in params else op.get_gpu_number()

for i in range(0, len(args[1])):
    curr_item = args[1][i]
    if i != len(args[1])-1: next_item = args[1][i+1]
    else: next_item = "1"
    if "--" in curr_item and "--" in next_item:
        key = curr_item.replace('-','')
        if key not in params:  params[key] = "1"
    elif "--" in curr_item and "--" not in next_item:
        key = curr_item.replace('-','')
        if key not in params: params[key] = next_item

try:
    opWrapper = op.WrapperPython()
    opWrapper.configure(params)
    opWrapper.start()

    videoPath = args[0].video

    start = time.time()

    cap = cv2.VideoCapture(videoPath)

    while cap.isOpened():
        grabbed, frame = cap.read()

        if frame is None or not grabbed:
            print("Finish reading video frames...")
            break

        datums = []

        for gpuId in range(0, numberGPUs):

            datum = op.Datum()
            datum.cvInputData = frame
            datums.append(datum)
            opWrapper.waitAndEmplace([datums[-1]])

        for gpuId in range(0, numberGPUs):

            datum = datums[gpuId]
            opWrapper.waitAndPop([datum])

            print("Body keypoints: \n" + str(datum.poseKeypoints))

            if not args[0].no_display:
                cv2.imshow("OpenPose 1.5.0 - Tutorial Python API", datum.cvOutputData)
                key = cv2.waitKey(1)
                if key == 27: break

    end = time.time()
    print("OpenPose demo successfully finished. Total time: " + str(end - start) + " seconds")
except Exception as e:
    sys.exit(-1)

And, that's my C++ code (in this version I'm saving all video frames inside a vector and then I do a loop to process them):

// --------------- OpenPose C++ API Tutorial - Example 5 - Body from images and multi GPU ---------------
// It reads images, process them, and display them with the pose (and optionally hand and face) keypoints. In addition,
// it includes all the OpenPose configuration flags (enable/disable hand, face, output saving, etc.).

// Command-line user intraface
#define OPENPOSE_FLAGS_DISABLE_PRODUCER
#define OPENPOSE_FLAGS_DISABLE_DISPLAY
#include <openpose/flags.hpp>
// OpenPose dependencies
#include <openpose/headers.hpp>
#include <opencv4/opencv2/opencv.hpp>

using namespace std;
using namespace cv;

// Custom OpenPose flags
// Producer
DEFINE_string(video, "examples/media/video.mp4",
    "Process input video.");
// OpenPose
DEFINE_bool(latency_is_irrelevant_and_computer_with_lots_of_ram, false,
    "If false, it will read and then then process images right away. If true, it will first store all the frames and"
    " later process them (slightly faster). However: 1) Latency will hugely increase (no frames will be processed"
    " until they have all been read). And 2) The program might go out of RAM memory with long videos or folders with"
    " many images (so the computer might freeze).");
// Display
DEFINE_bool(no_display,                 false,
    "Enable to disable the visual display.");

// This worker will just read and return all the jpg files in a directory
bool display(const std::shared_ptr<std::vector<std::shared_ptr<op::Datum>>>& datumsPtr)
{
    try
    {
        // User's displaying/saving/other processing here
            // datum.cvOutputData: rendered frame with pose or heatmaps
            // datum.poseKeypoints: Array<float> with the estimated pose
        if (datumsPtr != nullptr && !datumsPtr->empty())
        {
            // Display image and sleeps at least 1 ms (it usually sleeps ~5-10 msec to display the image)
            cv::imshow(OPEN_POSE_NAME_AND_VERSION + " - Tutorial C++ API", datumsPtr->at(0)->cvOutputData);
        }
        else
            op::log("Nullptr or empty datumsPtr found.", op::Priority::High);
        const auto key = (char)cv::waitKey(1);
        return (key == 27);
    }
    catch (const std::exception& e)
    {
        op::error(e.what(), __LINE__, __FUNCTION__, __FILE__);
        return true;
    }
}

void printKeypoints(const std::shared_ptr<std::vector<std::shared_ptr<op::Datum>>>& datumsPtr)
{
    try
    {
        // Example: How to use the pose keypoints
        if (datumsPtr != nullptr && !datumsPtr->empty())
        {
            op::log("Body keypoints: " + datumsPtr->at(0)->poseKeypoints.toString(), op::Priority::High);
            op::log("Face keypoints: " + datumsPtr->at(0)->faceKeypoints.toString(), op::Priority::High);
            op::log("Left hand keypoints: " + datumsPtr->at(0)->handKeypoints[0].toString(), op::Priority::High);
            op::log("Right hand keypoints: " + datumsPtr->at(0)->handKeypoints[1].toString(), op::Priority::High);
        }
        else
            op::log("Nullptr or empty datumsPtr found.", op::Priority::High);
    }
    catch (const std::exception& e)
    {
        op::error(e.what(), __LINE__, __FUNCTION__, __FILE__);
    }
}

void configureWrapper(op::Wrapper& opWrapper)
{
    try
    {
        // Configuring OpenPose

        // logging_level
        op::check(0 <= FLAGS_logging_level && FLAGS_logging_level <= 255, "Wrong logging_level value.",
                  __LINE__, __FUNCTION__, __FILE__);
        op::ConfigureLog::setPriorityThreshold((op::Priority)FLAGS_logging_level);
        op::Profiler::setDefaultX(FLAGS_profile_speed);

        // Applying user defined configuration - GFlags to program variables
        // outputSize
        const auto outputSize = op::flagsToPoint(FLAGS_output_resolution, "-1x-1");
        // netInputSize
        const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "-1x368");
        // faceNetInputSize
        const auto faceNetInputSize = op::flagsToPoint(FLAGS_face_net_resolution, "368x368 (multiples of 16)");
        // handNetInputSize
        const auto handNetInputSize = op::flagsToPoint(FLAGS_hand_net_resolution, "368x368 (multiples of 16)");
        // poseMode
        const auto poseMode = op::flagsToPoseMode(FLAGS_body);
        // poseModel
        const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose);
        // JSON saving
        if (!FLAGS_write_keypoint.empty())
            op::log("Flag `write_keypoint` is deprecated and will eventually be removed."
                    " Please, use `write_json` instead.", op::Priority::Max);
        // keypointScaleMode
        const auto keypointScaleMode = op::flagsToScaleMode(FLAGS_keypoint_scale);
        // heatmaps to add
        const auto heatMapTypes = op::flagsToHeatMaps(FLAGS_heatmaps_add_parts, FLAGS_heatmaps_add_bkg,
                                                      FLAGS_heatmaps_add_PAFs);
        const auto heatMapScaleMode = op::flagsToHeatMapScaleMode(FLAGS_heatmaps_scale);
        // >1 camera view?
        const auto multipleView = (FLAGS_3d || FLAGS_3d_views > 1);
        // Face and hand detectors
        const auto faceDetector = op::flagsToDetector(FLAGS_face_detector);
        const auto handDetector = op::flagsToDetector(FLAGS_hand_detector);
        // Enabling Google Logging
        const bool enableGoogleLogging = true;

        // Pose configuration (use WrapperStructPose{} for default and recommended configuration)
        const op::WrapperStructPose wrapperStructPose{
            poseMode, netInputSize, outputSize, keypointScaleMode, FLAGS_num_gpu, FLAGS_num_gpu_start,
            FLAGS_scale_number, (float)FLAGS_scale_gap, op::flagsToRenderMode(FLAGS_render_pose, multipleView),
            poseModel, !FLAGS_disable_blending, (float)FLAGS_alpha_pose, (float)FLAGS_alpha_heatmap,
            FLAGS_part_to_show, FLAGS_model_folder, heatMapTypes, heatMapScaleMode, FLAGS_part_candidates,
            (float)FLAGS_render_threshold, FLAGS_number_people_max, FLAGS_maximize_positives, FLAGS_fps_max,
            FLAGS_prototxt_path, FLAGS_caffemodel_path, (float)FLAGS_upsampling_ratio, enableGoogleLogging};
        opWrapper.configure(wrapperStructPose);
        // Face configuration (use op::WrapperStructFace{} to disable it)
        const op::WrapperStructFace wrapperStructFace{
            FLAGS_face, faceDetector, faceNetInputSize,
            op::flagsToRenderMode(FLAGS_face_render, multipleView, FLAGS_render_pose),
            (float)FLAGS_face_alpha_pose, (float)FLAGS_face_alpha_heatmap, (float)FLAGS_face_render_threshold};
        opWrapper.configure(wrapperStructFace);
        // Hand configuration (use op::WrapperStructHand{} to disable it)
        const op::WrapperStructHand wrapperStructHand{
            FLAGS_hand, handDetector, handNetInputSize, FLAGS_hand_scale_number, (float)FLAGS_hand_scale_range,
            op::flagsToRenderMode(FLAGS_hand_render, multipleView, FLAGS_render_pose), (float)FLAGS_hand_alpha_pose,
            (float)FLAGS_hand_alpha_heatmap, (float)FLAGS_hand_render_threshold};
        opWrapper.configure(wrapperStructHand);
        // Extra functionality configuration (use op::WrapperStructExtra{} to disable it)
        const op::WrapperStructExtra wrapperStructExtra{
            FLAGS_3d, FLAGS_3d_min_views, FLAGS_identification, FLAGS_tracking, FLAGS_ik_threads};
        opWrapper.configure(wrapperStructExtra);
        // Output (comment or use default argument to disable any output)
        const op::WrapperStructOutput wrapperStructOutput{
            FLAGS_cli_verbose, FLAGS_write_keypoint, op::stringToDataFormat(FLAGS_write_keypoint_format),
            FLAGS_write_json, FLAGS_write_coco_json, FLAGS_write_coco_json_variants, FLAGS_write_coco_json_variant,
            FLAGS_write_images, FLAGS_write_images_format, FLAGS_write_video, FLAGS_write_video_fps,
            FLAGS_write_video_with_audio, FLAGS_write_heatmaps, FLAGS_write_heatmaps_format, FLAGS_write_video_3d,
            FLAGS_write_video_adam, FLAGS_write_bvh, FLAGS_udp_host, FLAGS_udp_port};
        opWrapper.configure(wrapperStructOutput);
        // No GUI. Equivalent to: opWrapper.configure(op::WrapperStructGui{});
        // Set to single-thread (for sequential processing and/or debugging and/or reducing latency)
        if (FLAGS_disable_multi_thread)
            opWrapper.disableMultiThreading();
    }
    catch (const std::exception& e)
    {
        op::error(e.what(), __LINE__, __FUNCTION__, __FILE__);
    }
}

int tutorialApiCpp()
{
    try
    {
        op::log("Starting OpenPose demo...", op::Priority::High);
        const auto opTimer = op::getTimerInit();

        // Configuring OpenPose
        op::log("Configuring OpenPose...", op::Priority::High);
        op::Wrapper opWrapper{op::ThreadManagerMode::Asynchronous};
        configureWrapper(opWrapper);
        // Increase maximum wrapper queue size
        if (FLAGS_latency_is_irrelevant_and_computer_with_lots_of_ram)
            opWrapper.setDefaultMaxSizeQueues(std::numeric_limits<long long>::max());

        // Starting OpenPose
        op::log("Starting thread(s)...", op::Priority::High);
        opWrapper.start();

        // Read frames on directory
        const auto imagePaths = op::getFilesOnDirectory(FLAGS_image_dir, op::Extensions::Images);

        VideoCapture cap(FLAGS_video);
        vector<Mat> framesVideo;
        Mat frame;

        if (!FLAGS_latency_is_irrelevant_and_computer_with_lots_of_ram)
        {
            const auto numberGPUs = op::getGpuNumber();

            while (cap.isOpened()) 
            {
                // cout << "Reading frame..." << index << endl;

                cap >> frame;

                // If the frame is empty, break immediately
                if (frame.empty())
                  break;

                framesVideo.push_back(frame);

            }

            cap.release();

            cout << "Vector of frames created from video" << endl;

            for (auto imageBaseId = 0u ; imageBaseId < framesVideo.size() ; imageBaseId+=numberGPUs)
            {
                // Read and push images into OpenPose wrapper
                for (auto gpuId = 0 ; gpuId < numberGPUs ; gpuId++)
                {
                    const auto imageId = imageBaseId+gpuId;
                    if (imageId < framesVideo.size())
                    {
                        auto imageToProcess = framesVideo.at(imageId);
                        // Faster alternative that moves imageToProcess
                        opWrapper.waitAndEmplace(imageToProcess);
                        // // Slower but safer alternative that copies imageToProcess
                        // const auto imageToProcess = cv::imread(imagePath);
                        // opWrapper.waitAndPush(imageToProcess);
                    }
                }
                // Retrieve processed results from OpenPose wrapper
                for (auto gpuId = 0 ; gpuId < numberGPUs ; gpuId++)
                {
                    const auto imageId = imageBaseId+gpuId;
                    if (imageId < framesVideo.size())
                    {
                        std::shared_ptr<std::vector<std::shared_ptr<op::Datum>>> datumProcessed;
                        const auto status = opWrapper.waitAndPop(datumProcessed);
                        if (status && datumProcessed != nullptr)
                        {
                            //printKeypoints(datumProcessed);
                            if (!FLAGS_no_display)
                            {
                                const auto userWantsToExit = display(datumProcessed);
                                if (userWantsToExit)
                                {
                                    op::log("User pressed Esc to exit demo.", op::Priority::High);
                                    break;
                                }
                            }
                        }
                        else
                            op::log("Image could not be processed.", op::Priority::High);
                    }
                }
            }

        }

        // Measuring total time
        op::printTime(opTimer, "OpenPose demo successfully finished. Total time: ", " seconds.", op::Priority::High);

        // Return
        return 0;
    }
    catch (const std::exception& e)
    {
        return -1;
    }
}

int main(int argc, char *argv[])
{
    // Parsing command line flags
    gflags::ParseCommandLineFlags(&argc, &argv, true);

    // Running tutorialApiCpp
    return tutorialApiCpp();
}

Hope it's something clear...

soulslicer commented 5 years ago

I will look at it once I get back

On Wed, Jul 17, 2019 at 4:56 AM moncio notifications@github.com wrote:

Hi, do you prefer I send you the code directly by email?

Thanks!

El mar., 16 jul. 2019 a las 13:15, Raaj (notifications@github.com) escribió:

Hmm, I tested it on the image folder example. As I recall that did give the same speed. Maybe you can send me a code example

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/820?email_source=notifications&email_token=ABCPVB7E4F3XT6NNNIHQI6TP7WUU7A5CNFSM4FTZGJK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2AQWNQ#issuecomment-511773494 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABCPVB6S6QP5QOKI73LXKYDP7WUU7ANCNFSM4FTZGJKQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/820?email_source=notifications&email_token=AAYLS3ESMVYDQALMLJGFKTDP73NFFA5CNFSM4FTZGJK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DQQUI#issuecomment-512165969, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYLS3BCPUEP675HMDIXN3TP73NFFANCNFSM4FTZGJKQ .

CMU-Perceptual-Computing-Lab / openpose