cuiaiyu / dressing-in-order

(ICCV'21) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing" by Aiyu Cui, Daniel McKee and Svetlana Lazebnik
https://cuiaiyu.github.io/dressing-in-order
Other
513 stars 127 forks source link

Real time results are not matching #12

Closed LeftAttention closed 2 years ago

LeftAttention commented 2 years ago

I was trying the inference of some images form standard_test_anns.txt. I got the keypoints from openpose and the mask from SCHP. Then I used dressing in order inside demo.ipynb and changed the gids. The results from demo.ipynb are perfect but my results are not up-to the mark. Also the keypoints from openpose is also not matching with the dataset keypoints by GFLA

This is the result generated by demo.ipynb a111

This is my result. a112

Could you please suggest where I can generate the keypoints to match the results? Or is there anything I missed out?

Thanks in advance.

cuiaiyu commented 2 years ago

I think the difference is caused by "BODY_25" key point labels. Check this doc, there are two sets of key points labels that openpose supports: BODY_25 and COCO, and the difference is whether the 8-th joint is left hip or mid hip. We are using COCO, but the default one of openpose is BODY_25.

You can either re-run it as COCO or try some conversion functions by removing the 8-th joint (mid hip) from __BODY_25__.

LeftAttention commented 2 years ago

Thanks for the reply.

I am currently using opencv dnn wrapper of openpose. Here is my keypoints extraction code snippet.

import cv2
import numpy as np

protoFile = "weights/coco/pose_deploy_linevec.prototxt"
weightsFile = "weights/coco/pose_iter_440000.caffemodel"
nPoints = 18

image_file = "test_01.jpg"

frame = cv2.imread(image_file)

frameWidth = 176
frameHeight = 256
threshold = 0.1

net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

inWidth = 368
inHeight = 368
inpBlob = cv2.dnn.blobFromImage(frame, 1.0 / 255, (inWidth, inHeight),
                          (0, 0, 0), swapRB=False, crop=False)

net.setInput(inpBlob)

output = net.forward()

H = output.shape[2]
W = output.shape[3]

for i in range(nPoints):
    probMap = output[0, i, :, :]

    minVal, prob, minLoc, point = cv2.minMaxLoc(probMap)

    x = (frameWidth * point[0]) / W
    y = (frameHeight * point[1]) / H

    if prob > threshold : 
        points.append((int(x), int(y)))
    else :
        points.append(-1, -1)

Once I get the pose, then as per demo.ipynb I am converting them to tensor.

pose_array = np.array(points)
pose  = pose_utils.cords_to_map(pose_array, self.load_size, (256, 176))
pose = np.transpose(pose,(2, 0, 1))
pose = torch.Tensor(pose)

But the body 18 keypoints by this method is not matching with the dataset.

cuiaiyu commented 2 years ago

It seems PATN get the key points from [this library] (https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation). Maybe you can try this to get COCO format data there?

For the 'BODY_25' format, if you obtain the test_keypoints.json from test.png images by running openpose command line, you can try something like this conversion functions.

def load_pose_from_json(pose_json, target_size=(256,256), orig_size=(256,256)):
    with open(pose_json, 'r') as f:
        anno = json.load(f)
    if len(anno['people']) < 1:
        a,b = target_size
        return torch.zeros((18,a,b))
    anno = list(anno['people'][0]['pose_keypoints_2d'])
    x = np.array(anno[1::3])
    y = np.array(anno[::3])

    x[8:-1] = x[9:]
    y = np.array(anno[::3])
    y[8:-1] = y[9:]
    x[x==0] = -1
    y[y==0] = -1
    coord = np.concatenate([x[:,None], y[:,None]], -1)
    pose  = pose_utils.cords_to_map(coord, target_size, orig_size)
    pose = np.transpose(pose,(2, 0, 1))
    pose = torch.Tensor(pose)
    return pose[:18]