Daniil-Osokin / lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU.
Apache License 2.0
660 stars 138 forks source link

Image sequence and location of data output #56

Closed jamalknight closed 3 years ago

jamalknight commented 3 years ago

Hi there

Thanks for setting up this repo, the instructions are clear to setup.

I have a couple of questions, I would like to run this model on a sequence of images of my own, however a window pops up of only 1 image.

Code I run: python demo.py --model human-pose-estimation-3d.pth --image ./00_03/*.jpg

My list of images:

00_03_00000001.jpg
00_03_00000002.jpg
00_03_00000003.jpg
...

The other question is, where is the output of 3D coordinates of keypoints written?

Thanks

Daniil-Osokin commented 3 years ago

Hi! The pattern will not work here. You can name image one by one: ./00_03/00_03_00000001.jpg ./00_03/00_03_00000002.jpg ... or modify the code and call cv2.VideoCapture directly with this pattern.

jamalknight commented 3 years ago

Thanks for clarifying.

Where can I get the output of keypoints in 3D space coordinates?

Thanks

Daniil-Osokin commented 3 years ago

Will #28 help?

Daniil-Osokin commented 3 years ago

Hope, it helped.

jamalknight commented 3 years ago

Hi there

Is there a .pkl output with 3D coordinates? I am looking for the keypoints output in world space, but cannot see where it would be output to.

thanks

Daniil-Osokin commented 3 years ago

Hi! You can find pose coordinates in the world space in pose_3d after the line 110.

jamalknight commented 3 years ago

So would I have to write code to output the data to a file after line 110?

Daniil-Osokin commented 3 years ago

Yes.

jamalknight commented 3 years ago

Could you please let me know what the line of code would be to output 3D keypoint positions to a file? (sorry I have limited Python knowledge)

Thanks

Daniil-Osokin commented 3 years ago

Here is the diff:

diff --git a/demo.py b/demo.py
index 6ccd39e..b18d40c 100644
--- a/demo.py
+++ b/demo.py
@@ -4,6 +4,7 @@ import os

 import cv2
 import numpy as np
+import pickle

 from modules.input_reader import VideoReader, ImageReader
 from modules.draw import Plotter3d, draw_poses
@@ -85,6 +86,8 @@ if __name__ == '__main__':
     p_code = 112
     space_code = 32
     mean_time = 0
+    f = open('coordinates_3d.pkl', 'wb')
+    coordinates_history = []
     for frame in frame_provider:
         current_time = cv2.getTickCount()
         if frame is None:
@@ -108,6 +111,7 @@ if __name__ == '__main__':

             poses_3d = poses_3d.reshape(poses_3d.shape[0], 19, -1)[:, :, 0:3]
             edges = (Plotter3d.SKELETON_EDGES + 19 * np.arange(poses_3d.shape[0]).reshape((-1, 1, 1))).reshape((-1, 2))
+        coordinates_history.append(poses_3d)
         plotter.plot(canvas_3d, poses_3d, edges)
         cv2.imshow(canvas_3d_window_name, canvas_3d)

@@ -141,3 +145,5 @@ if __name__ == '__main__':
                 break
             else:
                 delay = 1
+    pickle.dump(coordinates_history, f)
+    f.close()
jamalknight commented 3 years ago

I added the above code after line 110, but errored.

To run: python demo.py --model human-pose-estimation-3d.pth --video 0

error:

  File "demo.py", line 114
    diff --git a/demo.py b/demo.py
               ^
SyntaxError: invalid syntax

Is there something I'm missing?

I uploaded demo.py here

from argparse import ArgumentParser
import json
import os

import cv2
import numpy as np

from modules.input_reader import VideoReader, ImageReader
from modules.draw import Plotter3d, draw_poses
from modules.parse_poses import parse_poses

def rotate_poses(poses_3d, R, t):
    R_inv = np.linalg.inv(R)
    for pose_id in range(len(poses_3d)):
        pose_3d = poses_3d[pose_id].reshape((-1, 4)).transpose()
        pose_3d[0:3, :] = np.dot(R_inv, pose_3d[0:3, :] - t)
        poses_3d[pose_id] = pose_3d.transpose().reshape(-1)

    return poses_3d

if __name__ == '__main__':
    parser = ArgumentParser(description='Lightweight 3D human pose estimation demo. '
                                        'Press esc to exit, "p" to (un)pause video or process next image.')
    parser.add_argument('-m', '--model',
                        help='Required. Path to checkpoint with a trained model '
                             '(or an .xml file in case of OpenVINO inference).',
                        type=str, required=True)
    parser.add_argument('--video', help='Optional. Path to video file or camera id.', type=str, default='')
    parser.add_argument('-d', '--device',
                        help='Optional. Specify the target device to infer on: CPU or GPU. '
                             'The demo will look for a suitable plugin for device specified '
                             '(by default, it is GPU).',
                        type=str, default='GPU')
    parser.add_argument('--use-openvino',
                        help='Optional. Run network with OpenVINO as inference engine. '
                             'CPU, GPU, FPGA, HDDL or MYRIAD devices are supported.',
                        action='store_true')
    parser.add_argument('--use-tensorrt', help='Optional. Run network with TensorRT as inference engine.',
                        action='store_true')
    parser.add_argument('--images', help='Optional. Path to input image(s).', nargs='+', default='')
    parser.add_argument('--height-size', help='Optional. Network input layer height size.', type=int, default=256)
    parser.add_argument('--extrinsics-path',
                        help='Optional. Path to file with camera extrinsics.',
                        type=str, default=None)
    parser.add_argument('--fx', type=np.float32, default=-1, help='Optional. Camera focal length.')
    args = parser.parse_args()

    if args.video == '' and args.images == '':
        raise ValueError('Either --video or --image has to be provided')

    stride = 8
    if args.use_openvino:
        from modules.inference_engine_openvino import InferenceEngineOpenVINO
        net = InferenceEngineOpenVINO(args.model, args.device)
    else:
        from modules.inference_engine_pytorch import InferenceEnginePyTorch
        net = InferenceEnginePyTorch(args.model, args.device, use_tensorrt=args.use_tensorrt)

    canvas_3d = np.zeros((720, 1280, 3), dtype=np.uint8)
    plotter = Plotter3d(canvas_3d.shape[:2])
    canvas_3d_window_name = 'Canvas 3D'
    cv2.namedWindow(canvas_3d_window_name)
    cv2.setMouseCallback(canvas_3d_window_name, Plotter3d.mouse_callback)

    file_path = args.extrinsics_path
    if file_path is None:
        file_path = os.path.join('data', 'extrinsics.json')
    with open(file_path, 'r') as f:
        extrinsics = json.load(f)
    R = np.array(extrinsics['R'], dtype=np.float32)
    t = np.array(extrinsics['t'], dtype=np.float32)

    frame_provider = ImageReader(args.images)
    is_video = False
    if args.video != '':
        frame_provider = VideoReader(args.video)
        is_video = True
    base_height = args.height_size
    fx = args.fx

    delay = 1
    esc_code = 27
    p_code = 112
    space_code = 32
    mean_time = 0
    for frame in frame_provider:
        current_time = cv2.getTickCount()
        if frame is None:
            break
        input_scale = base_height / frame.shape[0]
        scaled_img = cv2.resize(frame, dsize=None, fx=input_scale, fy=input_scale)
        scaled_img = scaled_img[:, 0:scaled_img.shape[1] - (scaled_img.shape[1] % stride)]  # better to pad, but cut out for demo
        if fx < 0:  # Focal length is unknown
            fx = np.float32(0.8 * frame.shape[1])

        inference_result = net.infer(scaled_img)
        poses_3d, poses_2d = parse_poses(inference_result, input_scale, stride, fx, is_video)
        edges = []
        if len(poses_3d):
            poses_3d = rotate_poses(poses_3d, R, t)
            poses_3d_copy = poses_3d.copy()
            x = poses_3d_copy[:, 0::4]
            y = poses_3d_copy[:, 1::4]
            z = poses_3d_copy[:, 2::4]
            poses_3d[:, 0::4], poses_3d[:, 1::4], poses_3d[:, 2::4] = -z, x, -y

            poses_3d = poses_3d.reshape(poses_3d.shape[0], 19, -1)[:, :, 0:3]
            edges = (Plotter3d.SKELETON_EDGES + 19 * np.arange(poses_3d.shape[0]).reshape((-1, 1, 1))).reshape((-1, 2))

#adding code to print 3d positions

diff --git a/demo.py b/demo.py
index 6ccd39e..b18d40c 100644
--- a/demo.py
+++ b/demo.py
@@ -4,6 +4,7 @@ import os

import cv2
import numpy as np
+import pickle

from modules.input_reader import VideoReader, ImageReader
from modules.draw import Plotter3d, draw_poses
@@ -85,6 +86,8 @@ if __name__ == '__main__':
     p_code = 112
     space_code = 32
     mean_time = 0
+    f = open('coordinates_3d.pkl', 'wb')
+    coordinates_history = []
     for frame in frame_provider:
     current_time = cv2.getTickCount()
     if frame is None:
@@ -108,6 +111,7 @@ if __name__ == '__main__':

         poses_3d = poses_3d.reshape(poses_3d.shape[0], 19, -1)[:, :, 0:3]
         edges = (Plotter3d.SKELETON_EDGES + 19 * np.arange(poses_3d.shape[0]).reshape((-1, 1, 1))).reshape((-1, 2))
+        coordinates_history.append(poses_3d)
     plotter.plot(canvas_3d, poses_3d, edges)
     cv2.imshow(canvas_3d_window_name, canvas_3d)

@@ -141,3 +145,5 @@ if __name__ == '__main__':
         break
         else:
         delay = 1
+    pickle.dump(coordinates_history, f)
+    f.close()

#end adding code to print 3d positions

        plotter.plot(canvas_3d, poses_3d, edges)
        cv2.imshow(canvas_3d_window_name, canvas_3d)

        draw_poses(frame, poses_2d)
        current_time = (cv2.getTickCount() - current_time) / cv2.getTickFrequency()
        if mean_time == 0:
            mean_time = current_time
        else:
            mean_time = mean_time * 0.95 + current_time * 0.05
        cv2.putText(frame, 'FPS: {}'.format(int(1 / mean_time * 10) / 10),
                    (40, 80), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 0, 255))
        cv2.imshow('ICV 3D Human Pose Estimation', frame)

        key = cv2.waitKey(delay)
        if key == esc_code:
            break
        if key == p_code:
            if delay == 1:
                delay = 0
            else:
                delay = 1
        if delay == 0 or not is_video:  # allow to rotate 3D canvas while on pause
            key = 0
            while (key != p_code
                   and key != esc_code
                   and key != space_code):
                plotter.plot(canvas_3d, poses_3d, edges)
                cv2.imshow(canvas_3d_window_name, canvas_3d)
                key = cv2.waitKey(33)
            if key == esc_code:
                break
            else:
                delay = 1

Thanks

Daniil-Osokin commented 3 years ago

I have shared a diff-file, you can read more about it here. Basically you need to add lines marked with a special sign + in a diff file to the demo.py file. Surrounding lines (where to add in demo.py) are also shown in a diff-file. If you are using git with this repository, you can just type git apply <path-to-diff-file> in a repository root to apply changes.

jamalknight commented 3 years ago

Got it! Thanks