You can use python's multiprocessing library to do this yourself. You can construct one openpose object per gpu per process.
In the next release or in the future, it has been thought to introduce this funcionality for the Python API? Thanks in advance!
Multi-GPU? It should already work.
For Python version? I was testing and comparing with C++ version and the results was totally different.
what do you mean they are different?
I was testing the example "05_keypoints_from_images_multi_gpu" both in the Python version than C++ version and the results were different. I changed the example to read a video (example video of media folder) using OpenCV. My computer has 2 graphic card (2x Nvidia GTX-1080), I monitorize the graphics performance using command "watch nvidia-smi" and I observe in both cases, the GPU utils is increased for both cards. But the problem is about the speed and profiling, when I run the example in C++ version I observe the time duration of the video processing is higher (about 7.5 secs) but when I test the Python version, the duration increases coming to the double (about 15 secs). So analizing this behaviour I ask if it's normal or maybe I'm doing something wrong. Thanks!
Hmm, I tested it on the image folder example. As I recall that did give the same speed. Maybe you can send me a code example
Of course, that's my Python code:
from openpose import pyopenpose as op
import sys
import cv2
import os
from sys import platform
import argparse
import time
parser = argparse.ArgumentParser()
parser.add_argument("--video", default="examples/media/video.mp4", help="Read input video (avi).")
parser.add_argument("--no_display", default=False, help="Enable to disable the visual display.")
args = parser.parse_known_args()
params = dict()
params["model_folder"] = "MODELS_PATH"
params["disable_multi_thread"] = "false"
numberGPUs = params["num_gpu"] if "num_gpu" in params else op.get_gpu_number()
for i in range(0, len(args[1])):
curr_item = args[1][i]
if i != len(args[1])-1: next_item = args[1][i+1]
else: next_item = "1"
if "--" in curr_item and "--" in next_item:
key = curr_item.replace('-','')
if key not in params: params[key] = "1"
elif "--" in curr_item and "--" not in next_item:
key = curr_item.replace('-','')
if key not in params: params[key] = next_item
opWrapper = op.WrapperPython()
videoPath = args[0].video
start = time.time()
cap = cv2.VideoCapture(videoPath)
while cap.isOpened():
grabbed, frame =
if frame is None or not grabbed:
print("Finish reading video frames...")
datums = []
for gpuId in range(0, numberGPUs):
datum = op.Datum()
datum.cvInputData = frame
for gpuId in range(0, numberGPUs):
datum = datums[gpuId]
print("Body keypoints: \n" + str(datum.poseKeypoints))
if not args[0].no_display:
cv2.imshow("OpenPose 1.5.0 - Tutorial Python API", datum.cvOutputData)
key = cv2.waitKey(1)
if key == 27: break
end = time.time()
print("OpenPose demo successfully finished. Total time: " + str(end - start) + " seconds")
except Exception as e:
And, that's my C++ code (in this version I'm saving all video frames inside a vector and then I do a loop to process them):
// --------------- OpenPose C++ API Tutorial - Example 5 - Body from images and multi GPU ---------------
// It reads images, process them, and display them with the pose (and optionally hand and face) keypoints. In addition,
// it includes all the OpenPose configuration flags (enable/disable hand, face, output saving, etc.).
// Command-line user intraface
#include <openpose/flags.hpp>
// OpenPose dependencies
#include <openpose/headers.hpp>
#include <opencv4/opencv2/opencv.hpp>
using namespace std;
using namespace cv;
// Custom OpenPose flags
// Producer
DEFINE_string(video, "examples/media/video.mp4",
"Process input video.");
// OpenPose
DEFINE_bool(latency_is_irrelevant_and_computer_with_lots_of_ram, false,
"If false, it will read and then then process images right away. If true, it will first store all the frames and"
" later process them (slightly faster). However: 1) Latency will hugely increase (no frames will be processed"
" until they have all been read). And 2) The program might go out of RAM memory with long videos or folders with"
" many images (so the computer might freeze).");
// Display
DEFINE_bool(no_display, false,
"Enable to disable the visual display.");
// This worker will just read and return all the jpg files in a directory
bool display(const std::shared_ptr<std::vector<std::shared_ptr<op::Datum>>>& datumsPtr)
// User's displaying/saving/other processing here
// datum.cvOutputData: rendered frame with pose or heatmaps
// datum.poseKeypoints: Array<float> with the estimated pose
if (datumsPtr != nullptr && !datumsPtr->empty())
// Display image and sleeps at least 1 ms (it usually sleeps ~5-10 msec to display the image)
cv::imshow(OPEN_POSE_NAME_AND_VERSION + " - Tutorial C++ API", datumsPtr->at(0)->cvOutputData);
op::log("Nullptr or empty datumsPtr found.", op::Priority::High);
const auto key = (char)cv::waitKey(1);
return (key == 27);
catch (const std::exception& e)
op::error(e.what(), __LINE__, __FUNCTION__, __FILE__);
return true;
void printKeypoints(const std::shared_ptr<std::vector<std::shared_ptr<op::Datum>>>& datumsPtr)
// Example: How to use the pose keypoints
if (datumsPtr != nullptr && !datumsPtr->empty())
op::log("Body keypoints: " + datumsPtr->at(0)->poseKeypoints.toString(), op::Priority::High);
op::log("Face keypoints: " + datumsPtr->at(0)->faceKeypoints.toString(), op::Priority::High);
op::log("Left hand keypoints: " + datumsPtr->at(0)->handKeypoints[0].toString(), op::Priority::High);
op::log("Right hand keypoints: " + datumsPtr->at(0)->handKeypoints[1].toString(), op::Priority::High);
op::log("Nullptr or empty datumsPtr found.", op::Priority::High);
catch (const std::exception& e)
op::error(e.what(), __LINE__, __FUNCTION__, __FILE__);
void configureWrapper(op::Wrapper& opWrapper)
// Configuring OpenPose
// logging_level
op::check(0 <= FLAGS_logging_level && FLAGS_logging_level <= 255, "Wrong logging_level value.",
__LINE__, __FUNCTION__, __FILE__);
// Applying user defined configuration - GFlags to program variables
// outputSize
const auto outputSize = op::flagsToPoint(FLAGS_output_resolution, "-1x-1");
// netInputSize
const auto netInputSize = op::flagsToPoint(FLAGS_net_resolution, "-1x368");
// faceNetInputSize
const auto faceNetInputSize = op::flagsToPoint(FLAGS_face_net_resolution, "368x368 (multiples of 16)");
// handNetInputSize
const auto handNetInputSize = op::flagsToPoint(FLAGS_hand_net_resolution, "368x368 (multiples of 16)");
// poseMode
const auto poseMode = op::flagsToPoseMode(FLAGS_body);
// poseModel
const auto poseModel = op::flagsToPoseModel(FLAGS_model_pose);
// JSON saving
if (!FLAGS_write_keypoint.empty())
op::log("Flag `write_keypoint` is deprecated and will eventually be removed."
" Please, use `write_json` instead.", op::Priority::Max);
// keypointScaleMode
const auto keypointScaleMode = op::flagsToScaleMode(FLAGS_keypoint_scale);
// heatmaps to add
const auto heatMapTypes = op::flagsToHeatMaps(FLAGS_heatmaps_add_parts, FLAGS_heatmaps_add_bkg,
const auto heatMapScaleMode = op::flagsToHeatMapScaleMode(FLAGS_heatmaps_scale);
// >1 camera view?
const auto multipleView = (FLAGS_3d || FLAGS_3d_views > 1);
// Face and hand detectors
const auto faceDetector = op::flagsToDetector(FLAGS_face_detector);
const auto handDetector = op::flagsToDetector(FLAGS_hand_detector);
// Enabling Google Logging
const bool enableGoogleLogging = true;
// Pose configuration (use WrapperStructPose{} for default and recommended configuration)
const op::WrapperStructPose wrapperStructPose{
poseMode, netInputSize, outputSize, keypointScaleMode, FLAGS_num_gpu, FLAGS_num_gpu_start,
FLAGS_scale_number, (float)FLAGS_scale_gap, op::flagsToRenderMode(FLAGS_render_pose, multipleView),
poseModel, !FLAGS_disable_blending, (float)FLAGS_alpha_pose, (float)FLAGS_alpha_heatmap,
FLAGS_part_to_show, FLAGS_model_folder, heatMapTypes, heatMapScaleMode, FLAGS_part_candidates,
(float)FLAGS_render_threshold, FLAGS_number_people_max, FLAGS_maximize_positives, FLAGS_fps_max,
FLAGS_prototxt_path, FLAGS_caffemodel_path, (float)FLAGS_upsampling_ratio, enableGoogleLogging};
// Face configuration (use op::WrapperStructFace{} to disable it)
const op::WrapperStructFace wrapperStructFace{
FLAGS_face, faceDetector, faceNetInputSize,
op::flagsToRenderMode(FLAGS_face_render, multipleView, FLAGS_render_pose),
(float)FLAGS_face_alpha_pose, (float)FLAGS_face_alpha_heatmap, (float)FLAGS_face_render_threshold};
// Hand configuration (use op::WrapperStructHand{} to disable it)
const op::WrapperStructHand wrapperStructHand{
FLAGS_hand, handDetector, handNetInputSize, FLAGS_hand_scale_number, (float)FLAGS_hand_scale_range,
op::flagsToRenderMode(FLAGS_hand_render, multipleView, FLAGS_render_pose), (float)FLAGS_hand_alpha_pose,
(float)FLAGS_hand_alpha_heatmap, (float)FLAGS_hand_render_threshold};
// Extra functionality configuration (use op::WrapperStructExtra{} to disable it)
const op::WrapperStructExtra wrapperStructExtra{
FLAGS_3d, FLAGS_3d_min_views, FLAGS_identification, FLAGS_tracking, FLAGS_ik_threads};
// Output (comment or use default argument to disable any output)
const op::WrapperStructOutput wrapperStructOutput{
FLAGS_cli_verbose, FLAGS_write_keypoint, op::stringToDataFormat(FLAGS_write_keypoint_format),
FLAGS_write_json, FLAGS_write_coco_json, FLAGS_write_coco_json_variants, FLAGS_write_coco_json_variant,
FLAGS_write_images, FLAGS_write_images_format, FLAGS_write_video, FLAGS_write_video_fps,
FLAGS_write_video_with_audio, FLAGS_write_heatmaps, FLAGS_write_heatmaps_format, FLAGS_write_video_3d,
FLAGS_write_video_adam, FLAGS_write_bvh, FLAGS_udp_host, FLAGS_udp_port};
// No GUI. Equivalent to: opWrapper.configure(op::WrapperStructGui{});
// Set to single-thread (for sequential processing and/or debugging and/or reducing latency)
if (FLAGS_disable_multi_thread)
catch (const std::exception& e)
op::error(e.what(), __LINE__, __FUNCTION__, __FILE__);
int tutorialApiCpp()
op::log("Starting OpenPose demo...", op::Priority::High);
const auto opTimer = op::getTimerInit();
// Configuring OpenPose
op::log("Configuring OpenPose...", op::Priority::High);
op::Wrapper opWrapper{op::ThreadManagerMode::Asynchronous};
// Increase maximum wrapper queue size
if (FLAGS_latency_is_irrelevant_and_computer_with_lots_of_ram)
opWrapper.setDefaultMaxSizeQueues(std::numeric_limits<long long>::max());
// Starting OpenPose
op::log("Starting thread(s)...", op::Priority::High);
// Read frames on directory
const auto imagePaths = op::getFilesOnDirectory(FLAGS_image_dir, op::Extensions::Images);
VideoCapture cap(FLAGS_video);
vector<Mat> framesVideo;
Mat frame;
if (!FLAGS_latency_is_irrelevant_and_computer_with_lots_of_ram)
const auto numberGPUs = op::getGpuNumber();
while (cap.isOpened())
// cout << "Reading frame..." << index << endl;
cap >> frame;
// If the frame is empty, break immediately
if (frame.empty())
cout << "Vector of frames created from video" << endl;
for (auto imageBaseId = 0u ; imageBaseId < framesVideo.size() ; imageBaseId+=numberGPUs)
// Read and push images into OpenPose wrapper
for (auto gpuId = 0 ; gpuId < numberGPUs ; gpuId++)
const auto imageId = imageBaseId+gpuId;
if (imageId < framesVideo.size())
auto imageToProcess =;
// Faster alternative that moves imageToProcess
// // Slower but safer alternative that copies imageToProcess
// const auto imageToProcess = cv::imread(imagePath);
// opWrapper.waitAndPush(imageToProcess);
// Retrieve processed results from OpenPose wrapper
for (auto gpuId = 0 ; gpuId < numberGPUs ; gpuId++)
const auto imageId = imageBaseId+gpuId;
if (imageId < framesVideo.size())
std::shared_ptr<std::vector<std::shared_ptr<op::Datum>>> datumProcessed;
const auto status = opWrapper.waitAndPop(datumProcessed);
if (status && datumProcessed != nullptr)
if (!FLAGS_no_display)
const auto userWantsToExit = display(datumProcessed);
if (userWantsToExit)
op::log("User pressed Esc to exit demo.", op::Priority::High);
op::log("Image could not be processed.", op::Priority::High);
// Measuring total time
op::printTime(opTimer, "OpenPose demo successfully finished. Total time: ", " seconds.", op::Priority::High);
// Return
return 0;
catch (const std::exception& e)
return -1;
int main(int argc, char *argv[])
// Parsing command line flags
gflags::ParseCommandLineFlags(&argc, &argv, true);
// Running tutorialApiCpp
return tutorialApiCpp();
Hope it's something clear...
I will look at it once I get back
Hi, do you prefer I send you the code directly by email?
Hmm, I tested it on the image folder example. As I recall that did give the same speed. Maybe you can send me a code example
Issue Summary
I am using open pose python API. Does the Python API support multi-gpu? And is it possible to do pose estimation with video instead of image? How many total multi gpu do you support?
Executed Command (if any)
Note: add
--logging_level 0 --disable_multi_thread
OpenPose Output (if any)
Type of Issue
You might select multiple topics, delete the rest:
Your System Configuration
OpenPose version: Latest GitHub code? Or specific commit (e.g., d52878f)? Or specific version from
section (e.g., 1.2.0)?General configuration: Installation mode: CMake Operating system (lsb_release -a in Ubuntu): Ubuntu 14.04.5 LTS Release or Debug mode? (by default: release): release Compiler (gcc --version in Ubuntu or VS version in Windows): gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Non-default settings:
3rd-party software:
cmake --version
in Ubuntu): cmake version 3.9.3apt-get install libopencv-dev
(only Ubuntu);If GPU mode issue:
cat /usr/local/cuda/version.txt
in most cases): CUDA Version 8.0.61nvidia-smi
in Ubuntu): 1080 TiIf CPU-only mode issue:
If Python API:
python -c "import numpy; print numpy.version.version"
in Ubuntu):If Windows system: z
If speed performance issue: z