ankanbhunia / PIDM

Person Image Synthesis via Denoising Diffusion Model (CVPR 2023)
https://ankanbhunia.github.io/PIDM
MIT License
479 stars 62 forks source link

About the tgt_pose when inference #27

Open gouchaonijiao opened 1 year ago

gouchaonijiao commented 1 year ago

Hello! Thanks for your amazing work! I want to know that how I make more "data/deepfashion_256x256/target_pose/*.npy" ?

gouchaonijiao commented 1 year ago

Since there are only 6 keypoints can be used, I'd like to make more keypoints to .npy file with your method. If you could reply to me,I would greatly appreciate it!

ankanbhunia commented 1 year ago

Please use the following code to generate the tgtpose tensors. You first need to download the keypoints pose.rar extracted with Openpose from [Google Drive](https://drive.google.com/file/d/1waNzq-deGBKATXMU9JzMDWdGsF4YkcW/view?usp=sharing).

import os
import cv2 
import math
import numpy as np
from io import BytesIO
from PIL import Image

import torch
import torchvision.transforms.functional as F
from torch.utils.data import Dataset

from data.fashion_base_function import get_random_params

def get_label_tensor(path, shape = [256,256]):

    scale_param = 0.05

    param = get_random_params(shape, scale_param)

    limbSeq = [[2, 3], [2, 6], [3, 4], [4, 5], [6, 7], [7, 8], [2, 9], [9, 10], \
    [10, 11], [2, 12], [12, 13], [13, 14], [2, 1], [1, 15], [15, 17], \
    [1, 16], [16, 18], [3, 17], [6, 18]]

    colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
            [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
            [170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
    canvas = np.zeros((shape[0], shape[1], 3)).astype(np.uint8)
    keypoint = np.loadtxt(path)
    keypoint = trans_keypoins(keypoint, param, shape)
    stickwidth = 4
    for i in range(18):
        x, y = keypoint[i, 0:2]
        if x == -1 or y == -1:
            continue
        cv2.circle(canvas, (int(x), int(y)), 4, colors[i], thickness=-1)
    joints = []
    for i in range(17):
        Y = keypoint[np.array(limbSeq[i])-1, 0]
        X = keypoint[np.array(limbSeq[i])-1, 1]            
        cur_canvas = canvas.copy()
        if -1 in Y or -1 in X:
            joints.append(np.zeros_like(cur_canvas[:, :, 0]))
            continue
        mX = np.mean(X)
        mY = np.mean(Y)
        length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
        angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
        polygon = cv2.ellipse2Poly((int(mY), int(mX)), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
        cv2.fillConvexPoly(cur_canvas, polygon, colors[i])
        canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)

        joint = np.zeros_like(cur_canvas[:, :, 0])
        cv2.fillConvexPoly(joint, polygon, 255)
        joint = cv2.addWeighted(joint, 0.4, joint, 0.6, 0)
        joints.append(joint)
    pose = F.to_tensor(Image.fromarray(cv2.cvtColor(canvas, cv2.COLOR_BGR2RGB)))

    tensors_dist = 0
    e = 1
    for i in range(len(joints)):
        im_dist = cv2.distanceTransform(255-joints[i], cv2.DIST_L1, 3)
        im_dist = np.clip((im_dist / 3), 0, 255).astype(np.uint8)
        tensor_dist = F.to_tensor(Image.fromarray(im_dist))
        tensors_dist = tensor_dist if e == 1 else torch.cat([tensors_dist, tensor_dist])
        e += 1

    label_tensor = torch.cat((pose, tensors_dist), dim=0)
    if int(keypoint[14, 0]) != -1 and int(keypoint[15, 0]) != -1:
        y0, x0 = keypoint[14, 0:2]
        y1, x1 = keypoint[15, 0:2]
        face_center = torch.tensor([y0, x0, y1, x1]).float()
    else:
        face_center = torch.tensor([-1, -1, -1, -1]).float()               
    return label_tensor, face_center

def trans_keypoins(keypoints, param, img_size):
    missing_keypoint_index = keypoints == -1

    # crop the white line in the original dataset
    keypoints[:,0] = (keypoints[:,0]-40)

    # resize the dataset
    img_w, img_h = img_size
    scale_h = 1.0/176.0 * img_h
    scale_w = 1.0/256.0 * img_w

    if 'scale_size' in param and param['scale_size'] is not None:
        new_h, new_w = param['scale_size']
        scale_w = scale_w / img_w * new_w
        scale_h = scale_h / img_h * new_h

    if 'crop_param' in param and param['crop_param'] is not None:
        w, h, _, _ = param['crop_param']
    else:
        w, h = 0, 0

    keypoints[:,0] = keypoints[:,0]*scale_h - h
    keypoints[:,1] = keypoints[:,1]*scale_w - w
    keypoints[missing_keypoint_index] = -1
    return keypoints

if __name__ == "__main__":

    tgt_pose_tensor = get_label_tensor('pose/WOMEN/Dresses/id_00000008/02_3_back.txt')

Edit (27/12/2023): fixed a bug to incorporate non-square size tgt_pose.

gouchaonijiao commented 1 year ago

Thanks for your reply! I will try it right away!

gouchaonijiao commented 1 year ago

I want to express my gratitude to you again! The code you provided is very helpful to me ! Thanks very much

aravind-h-v commented 1 year ago

Hi, I was able to use the script provided to generate the 20256256 numpy arrays representing the openpose data but I wanted to know was there any way to generate the 18 or 20 pairs of numbers present in each file of pose.rar, may be starting from openpose editor json output? How exactly were these numbers generated? I am mainly interested in this for inference.

wzic commented 1 year ago

Hi! May I know why there are 20 keypoints of the pose for the reference? The conventional number of keypoints is 18. Also, it seems that the input image is scaled to 256256, but the size in training is 256176. May I know why is this difference?

island443 commented 8 months ago

Hello, what are the label_tensor and face_center in the script you gave? The type returned by the main function is a tuple, which is the key point information required for model inference?

ankanbhunia commented 8 months ago

label_tensor.