fyviezhao / M3D-VTON

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"
168 stars 38 forks source link

How to extract double depth map in .npy format from 2D images #8

Open sanazsab opened 2 years ago

sanazsab commented 2 years ago

Thanks for your interesting paper called "M3D-VTON: A Monocular-to-3D Virtual Try-On Network". I'm also involved in such a similar area and I need to know how could I extract the ground truth depth (.npy files) for 2D images. Could you please guide me in this regard? My next question is that how you catch the camera specification from 2D images?

fyviezhao commented 2 years ago

Hi, we run PIFuHD on 2D images and then orthographically project the predicted mesh to obtain the GT depths. The camera parameters can be found here.

sanazsab commented 2 years ago

Thank you so much for your answer. How do you obtain the camera parameters? Because my pictures are of different widths and heights, May I also use these parameters? I also refer to Pyrender: https://pyrender.readthedocs.io/en/latest/generated/pyrender.camera.OrthographicCamera.html?highlight=pyrender.camera.OrthographicCamera#pyrender.camera.OrthographicCamera

But the inputs are passive to me that how could I know about them? Could you please guide me on how could I extract the inputs for Pyrender?

sanazsab commented 2 years ago

Hi, we run PIFuHD on 2D images and then orthographically project the predicted mesh to obtain the GT depths. The camera parameters can be found here.

Also, the output of Pyrender is not .npy? How could I deal with it?

sanazsab commented 2 years ago

Hi, we run PIFuHD on 2D images and then orthographically project the predicted mesh to obtain the GT depths. The camera parameters can be found here.

Thanks for your attention. Why did you change the background of the images to black? Because my mesh results are not good with Pifuhd and I'm wondering how could I improve it. Thanks

fyviezhao commented 2 years ago

Thank you so much for your answer. How do you obtain the camera parameters? Because my pictures are of different widths and heights, May I also use these parameters? I also refer to Pyrender: https://pyrender.readthedocs.io/en/latest/generated/pyrender.camera.OrthographicCamera.html?highlight=pyrender.camera.OrthographicCamera#pyrender.camera.OrthographicCamera

But the inputs are passive to me that how could I know about them? Could you please guide me on how could I extract the inputs for Pyrender?

We select the camera parameters based on the fact that the PIFuHD meshes always reside in a unit box. Therefore you can use the same parameters for non-square images. The following code snippet may help you obtain the ground truth depth from the estimated PIFuHD meshes:

import numpy as np
import pyrender
import trimesh
import os
os.environ['PYOPENGL_PLATFORM'] = 'egl'  # for headless server

def render_depth(mesh_path, camera_pose, im_height, im_width):
    camera = pyrender.camera.OrthographicCamera(xmag=1.0, ymag=1.0, znear=1.0, zfar=3.0)
    mesh = pyrender.Mesh.from_trimesh(trimesh.load(mesh_path))
    light = pyrender.PointLight(color=[1.0, 0.0, 0.0], intensity=2.0)

    scene = pyrender.Scene()
    scene.add(mesh, pose=np.eye(4))
    scene.add(camera, pose=camera_pose)
    scene.add(light, pose=camera_pose)

    r = pyrender.OffscreenRenderer(viewport_width=im_width, viewport_height=im_height, point_size=1.0)
    color, depth, depth_glwin = r.render(scene)
    r.delete()

    return color, depth, depth_glwin

if __name__ == '__main__':
    cam_pose_front = np.eye(4)
    cam_pose_front[2,3] = 2.

    cam_pose_back = np.eye(4)
    cam_pose_back[2,3] = 2.
    cam_pose_back[0,0] *= -1.
    cam_pose_back[2,2] *= -1.
    cam_pose_back[2,3] *= -1.

    mesh_path = '/path/to/pifuhd/mesh'
    assert mesh_path.endswith('.obj')
    # render front depth map
    color, depth, depth_glwin_front = ortho.render(mesh_path, camera_pose_front, im_height=512, im_width=320)
    np.save('front_depth.npy', depth_glwin_front)
    # render back depth map
    color, depth, depth_glwin_front = ortho.render(mesh_path, camera_pose_back, im_height=512, im_width=320)
    np.save('back_depth.npy', depth_glwin_front)

NOTE: The current master branch of pyrender fails at recovering raw depth for orthographic cameras (see here and here). We provide a modified rendering script for use of M3D-VTON here. Please first pip install pyrender and then replace the pyrender/renderer.py script with our modified renderer.py script (see differences between line 1151-1195). Now you can save the returned depth_glwin as a .npy file.

fyviezhao commented 2 years ago

Hi, we run PIFuHD on 2D images and then orthographically project the predicted mesh to obtain the GT depths. The camera parameters can be found here.

Thanks for your attention. Why did you change the background of the images to black? Because my mesh results are not good with Pifuhd and I'm wondering how could I improve it. Thanks

We found that PIFu-HD performs more stable when the person is centered in the image with black background. What are your images and the estimated PIFuHD meshes look like?

sanazsab commented 2 years ago

Hi, we run PIFuHD on 2D images and then orthographically project the predicted mesh to obtain the GT depths. The camera parameters can be found here.

Thanks for your attention. Why did you change the background of the images to black? Because my mesh results are not good with Pifuhd and I'm wondering how could I improve it. Thanks

We found that PIFu-HD performs more stable when the person is centered in the image with black background. What are your images and the estimated PIFuHD meshes look like?

Thank you so much for your quick response and your support.

Did you use Demo and its original height and width? or you change it based on your images? Because when I change the size, it will not be great. and may I ask you how you convert it to the black background? My images are also MPV but with a lower resolution as 192*256. I will attach it here

image

fyviezhao commented 2 years ago

Sorry for the late reply. Yes, I pad the MPV 512*320 images to 512*512 and then use the original PIFuHD demo with its default setting of image size (i.e., 512512). It might be due to the small size of your input image that PIFuHD fails. Have you tried changing the --resolution option in this line to 256 after padding your 256*192 images to 256\256?

It is easy to blackout the image background by either simply using remove.bg or by human parsing (such as this or this).

sanazsab commented 2 years ago

No worries, Thanks a bunch for your attention.

Yes, I have tried changing, but it does not affect the results. Maybe because I did not use padding to the same size. The quality of pictures is so important.

Thanks for the suggestion. The first app needs to apply each image and take time for muti images. The second Github links are for parsing. I do not know how could I find the related module to blackout. I used the grabcut and python cv2 to blackout but some parts are still white.

fyviezhao commented 2 years ago

Is there some reason that you choose to use the 256*192 MPV instead of 512*320? PIFuHD performs well on 512*512 images but may not fit to 256*256 (which is not that "HD"?). Padding might be a problem, but the low image resolution can also harm the 3D reconstruction quality.

Moreover, I would not recommend using grabcut to segment the person images. For most person images, the human parsing methods are good enough for obtaining and blackouting their background by:

person_img = cv2.imread(person_img_path)
human_parsing_result = parsing_model(person_img) # the aforementioned github links for parsing
backgroud = np.where(human_parsing_result==0) # obtain the background mask
person_img[background] = 0 # change background to black
sanazsab commented 2 years ago
parsing_model

Thanks a lot for your insight.

Yes, it's true.

That's a good suggestion. I used CHIP for parsing. But this part person_img[background] = 0 has error.

Is there any easiest way for blacking out? I used image = cv2.imread(path) r = 150.0 / image.shape[1] dim = (150, int(image.shape[0] * r)) resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA) lower_white = np.array([220, 220, 220], dtype=np.uint8) upper_white = np.array([255, 255, 255], dtype=np.uint8) mask = cv2.inRange(resized, lower_white, upper_white) # could also use threshold res = cv2.bitwise_not(resized, resized, mask)

cv2.imshow('res', res) # gives black background cv2.imwrite('0A.png', res)

But for some pictures do not work.

aryacodez commented 2 years ago
parsing_model

Thanks a lot for your insight.

Yes, it's true.

That's a good suggestion. I used CHIP for parsing. But this part person_img[background] = 0 has error.

Is there any easiest way for blacking out? I used image = cv2.imread(path) r = 150.0 / image.shape[1] dim = (150, int(image.shape[0] * r)) resized = cv2.resize(image, dim, interpolation=cv2.INTER_AREA) lower_white = np.array([220, 220, 220], dtype=np.uint8) upper_white = np.array([255, 255, 255], dtype=np.uint8) mask = cv2.inRange(resized, lower_white, upper_white) # could also use threshold res = cv2.bitwise_not(resized, resized, mask)

cv2.imshow('res', res) # gives black background cv2.imwrite('0A.png', res)

But for some pictures do not work.

Can u share me your colab notebook with the changes you made as I am also working on similar project and facing similar issue.

LogWell commented 2 years ago

Hi @fyviezhao , do you know how to restore a 3D point cloud from the rendered depth map in pyrender, in which the camera is any of three modes in camera?