elliottwu / unsup3d

(CVPR'20 Oral) Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
MIT License
1.19k stars 192 forks source link

Questions about evaluation metrics #10

Open YokkaBear opened 4 years ago

YokkaBear commented 4 years ago

Congrats on your best paper award and thank you for your generous open source.

Now that I have been through the training process, and obtain a model trained for 70 epochs. After I run the test code of the model on the test set, I got a series of directories consisting of images used for 3D reconstruction. However, I did not find the file to output the evaluation metrics like "scale-invariant depth error (SIDE)" or "mean angle deviation (MAD)" as mentioned in the paper.

So what I wonder is how to output or where I can find these evaluation metrics to measure the performance of my trained model, and if any ground truth data needed for this evaluation process.

Looking forward to your reply and help, much thanks.

elliottwu commented 4 years ago

Hi! Evaluation metrics can computed only when you have ground truth depth maps, such as in the provided synthetic face dataset. After running the test code with run_test: true and load_gt_depth: true specified in the *.yml config file, you should see a text file named eval_scores.txt in the result folder, which contains the scores.

YokkaBear commented 4 years ago

Thank you so much. Excuse that I have another question about the ground truth data, that is how you generate the depth map images for ground truth (as shown in the picture attached). E.g. If I want to check the evaluation metrics of the model trained on cat face dataset or others, it would be necessary to obtain the depth map images as ground truth. Thank you.

000002_depth_1_1

elliottwu commented 4 years ago

We used a synthetic face model to obtain ground truth depth maps for evaluation. We do not have cat face datasets with ground truth depth maps, and cannot evaluate the predicted depth maps directly. For human faces, there are datasets with ground truth 3D scans such as NoW Dataset so it is possible to perform a direct evaluation on the depth predictions.

YokkaBear commented 4 years ago

Thank you for your reply. Following your guidance, I have browsed some 3D human dataset like NoW Dataset and Bosphorus, but I noticed that the ground truth data of them are mostly given in the format of obj/off files representing the 3D scans, but not in depth map images. So I wonder how you convert the 3D scans in obj/off format into depth maps in png/jpg format as an image. Thank you.

elliottwu commented 4 years ago

Cool! There are various ways of converting depth maps to meshes, which can be stored as an .obj file. We have a piece of code in the demo, which does this in a fairly naive way: https://github.com/elliottwu/unsup3d/blob/30f4550b6bab6a520e9dd005dadac637b2fb9eb6/demo/demo.py#L182.

YokkaBear commented 4 years ago

Hi @elliottwu , Now I am trying to use another human face dataset to train your unsup3d model, but before training, I noticed that the images in my dataset are all un-cropped, i.e. the background data are also encompassed. So I wonder whether I should apply some face-cropping method to crop the faces and make up a new face-cropped dataset to train the model, or just use the original un-cropped dataset? Hope your reply, thank you.

elliottwu commented 4 years ago

You should crop the images for better results. The MTCNN face detector (facenet) is a good option. The demo code provides a cropping scheme: https://github.com/elliottwu/unsup3d/blob/30f4550b6bab6a520e9dd005dadac637b2fb9eb6/demo/demo.py#L110.

YokkaBear commented 4 years ago

@elliottwu big thanks to your demo scripts, the cropping effect is much greater than the methods searched by my own. 👍

YokkaBear commented 4 years ago

HI, @elliottwu , when I tried to use unsup3d model to another dataset, I encountered the following situation:
I have trained unsup3d on the dataset for 30 epochs before, but when I tried to use the well-trained model to predict 3D human face from the input human face images, I got the following intermediate results:

00001_canonical_albedo.png
00001_canonical_albedo 00001_canonical_image.png
00001_canonical_image 00001_recon_image_flip.png
00001_recon_image_flip 00001_recon_image.png
00001_recon_image

It seems that the reconstruction face retains only half of the original face but loses the other half, and the canonical face then flips that only half of face, leading to a weird result (just a guess, can't make it exact).
In this case, what should I do to make the model output a better and a correct result? Maybe relevant to the quality of dataset? Or maybe relevant to the parameters of the dataloader, like 'image_size' or 'crop'?
Really hope to hear your points and advice on my problem, thank you!

YokkaBear commented 3 years ago

Cool! There are various ways of converting depth maps to meshes, which can be stored as an .obj file. We have a piece of code in the demo, which does this in a fairly naive way:

https://github.com/elliottwu/unsup3d/blob/30f4550b6bab6a520e9dd005dadac637b2fb9eb6/demo/demo.py#L182 .

Hi, @elliottwu , I was trying to recover 3D mesh (.obj) from BFM GT-depth using your demo code, but I met some trouble in implementing this process. Would you (or anyone else) like to share the code on how to recover 3D mesh (.obj) from those GT-depth? Thanks a lot!

**** update ****

Here is my code to achieve depth-to-3d recovery, however when I input a ground truth depth image as Fig. 1 with the dimension (256,256,3), I got a strange 3d mesh result (Fig. 2). Could anyone tell me how to get a correct recovered 3d mesh? Really urgent, and greatest thanks!!!

Fig. 1 图片

Fig. 2 图片

import argparse
import numpy as np
from PIL import Image
import torch
import math
import torch.nn as nn
from utils import *  # gaidong 去掉了。
import cv2
import pdb

EPS = 1e-7
use_gpu = True  # default
device = 'cuda:1' if use_gpu else 'cpu'  # origin: cuda:1
image_size = 256  # gaidong input resize; origin: 64
min_depth = 0.9
max_depth = 1.1
border_depth = 0.7*max_depth + 0.3*min_depth
fov = 10  # in degrees
save_dir = '/root/3dface/unsup3d_modified/demo/images/depth_test/results'

depth_rescaler = lambda d : (1+d)/2 *max_depth + (1-d)/2 *min_depth  # (-1,1) => (min_depth,max_depth)
fx = (image_size-1)/2/(np.tan(fov/2 * np.pi/180))
fy = (image_size-1)/2/(np.tan(fov/2 * np.pi/180))
cx = (image_size-1)/2
cy = (image_size-1)/2
K = [[fx, 0., cx],
     [0., fy, cy],
     [0., 0., 1.]]  # camera parameter
K = torch.FloatTensor(K).to(device)
inv_K = torch.inverse(K).unsqueeze(0)  # got inv_K
K = K.unsqueeze(0)

def depth_to_3d_grid(depth, inv_K=None):
    if inv_K is None:
        inv_K = inv_K
    b, h, w = depth.shape 
    grid_2d = get_grid(b, h, w, normalize=False).to(depth.device)  # Nxhxwx2
    depth = depth.unsqueeze(-1)
    grid_3d = torch.cat((grid_2d, torch.ones_like(depth)), dim=3)
    grid_3d = grid_3d.matmul(inv_K.transpose(2, 1)) * depth
    return grid_3d

def get_normal_from_depth(depth):
    b, h, w = depth.shape
    grid_3d = depth_to_3d_grid(depth, inv_K)  # change

    tu = grid_3d[:,1:-1,2:] - grid_3d[:,1:-1,:-2]
    tv = grid_3d[:,2:,1:-1] - grid_3d[:,:-2,1:-1]
    normal = tu.cross(tv, dim=3)

    zero = normal.new_tensor([0,0,1])
    normal = torch.cat([zero.repeat(b,h-2,1,1), normal, zero.repeat(b,h-2,1,1)], 2)
    normal = torch.cat([zero.repeat(b,1,w,1), normal, zero.repeat(b,1,w,1)], 1)
    normal = normal / (((normal**2).sum(3, keepdim=True))**0.5 + EPS)
    return normal

if __name__ == "__main__": 
    input_path = '/root/3dface/unsup3d_modified/demo/images/depth_test/000008_depth_1_1.png'  # input depth image
    depth_input = cv2.imread(input_path)
    # print(depth_input)  # ok
    canon_depth = torch.Tensor(depth_input).permute(2, 0, 1)  # 3*256*256
    canon_depth = canon_depth.mean(axis=0, keepdims=True)  # 1*256*256, average from 3*256*256
    # canon_depth = torch.Tensor(depth_input)  # use original depth image: no use.

    ## predict canonical depth (rescale)
    b = 1  # 在torch.cat之前b=1
    canon_depth = canon_depth - canon_depth.view(b, -1).mean(1).view(b, 1, 1)  # no error
    canon_depth = canon_depth.tanh()  # rescale to (-1, 1)
    canon_depth = depth_rescaler(canon_depth)

    ## clamp border depth
    h = w = image_size
    canon_depth = canon_depth.to(torch.device('cuda:1'))  # deploy canon_depth to cuda
    depth_border = torch.zeros(1, h, w - 4).to(torch.device('cuda:1'))  # deploy depth_border to cuda
    depth_border = nn.functional.pad(depth_border, (2, 2), mode='constant', value=1)
    canon_depth = canon_depth * (1 - depth_border) + depth_border * border_depth
    canon_depth = torch.cat([canon_depth, canon_depth.flip(2)], 0)  # cat at axis=0, 1*256*256->2*256*256
    # canon_depth = canon_depth.to(torch.device('cuda:1'))  # deploy tensor to cuda

    print(canon_depth)  # debug
    pdb.set_trace()  # set bp
    canon_normal = get_normal_from_depth(canon_depth)  # dim=2*256*256*3

    ## export to obj strings
    vertices = depth_to_3d_grid(canon_depth, inv_K)  # BxHxWx3, B=2(origin+flip) gaidong; both deployed on cuda; dim=2*256*256*3
    objs, mtls = export_to_obj_string(vertices, canon_normal)  # error: thread stuck at this line

    # 将with open as改写成open
    f = open(os.path.join(save_dir, 'result.mtl'), "w")
    f.write(mtls[0].replace('$TXTFILE', input_path))
    f.close()
    f = open(os.path.join(save_dir, 'result.obj'), "w")
    f.write(objs[0].replace('$MTLFILE', './result.mtl'))
    f.close()
Kazamori1 commented 2 years ago

HI, @elliottwu , when I tried to use unsup3d model to another dataset, I encountered the following situation: I have trained unsup3d on the dataset for 30 epochs before, but when I tried to use the well-trained model to predict 3D human face from the input human face images, I got the following intermediate results:

00001_canonical_albedo.png 00001_canonical_albedo 00001_canonical_image.png 00001_canonical_image 00001_recon_image_flip.png 00001_recon_image_flip 00001_recon_image.png 00001_recon_image

It seems that the reconstruction face retains only half of the original face but loses the other half, and the canonical face then flips that only half of face, leading to a weird result (just a guess, can't make it exact). In this case, what should I do to make the model output a better and a correct result? Maybe relevant to the quality of dataset? Or maybe relevant to the parameters of the dataloader, like 'image_size' or 'crop'? Really hope to hear your points and advice on my problem, thank you!

同学你好!@YokkaBear我在复现这篇论文时也出现了重建出的人脸图像只有一半的情况,请问你解决这个问题了吗?期待你的回复,谢谢!

YokkaBear commented 2 years ago

总的来说是数据集的问题,建议做一些数据预处理(比如像作者建议的那样切分出关键域)/数据增强,或者更换数据集。

王尤嘉

@.*** | 签名由网易邮箱大师定制

On 11/22/2021 @.***> wrote:

HI, @elliottwu , when I tried to use unsup3d model to another dataset, I encountered the following situation: I have trained unsup3d on the dataset for 30 epochs before, but when I tried to use the well-trained model to predict 3D human face from the input human face images, I got the following intermediate results:

00001_canonical_albedo.png 00001_canonical_image.png 00001_recon_image_flip.png 00001_recon_image.png

It seems that the reconstruction face retains only half of the original face but loses the other half, and the canonical face then flips that only half of face, leading to a weird result (just a guess, can't make it exact). In this case, what should I do to make the model output a better and a correct result? Maybe relevant to the quality of dataset? Or maybe relevant to the parameters of the dataloader, like 'image_size' or 'crop'? Really hope to hear your points and advice on my problem, thank you!

@.***我在复现这篇论文时也出现了重建出的人脸图像只有一半的情况,请问你解决这个问题了吗?期待你的回复,谢谢!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Kazamori1 commented 2 years ago

总的来说是数据集的问题,建议做一些数据预处理(比如像作者建议的那样切分出关键域)/数据增强,或者更换数据集。 | | 王尤嘉 | | @. | 签名由网易邮箱大师定制 On 11/22/2021 @.> wrote: HI, @elliottwu , when I tried to use unsup3d model to another dataset, I encountered the following situation: I have trained unsup3d on the dataset for 30 epochs before, but when I tried to use the well-trained model to predict 3D human face from the input human face images, I got the following intermediate results: 00001_canonical_albedo.png 00001_canonical_image.png 00001_recon_image_flip.png 00001_recon_image.png It seems that the reconstruction face retains only half of the original face but loses the other half, and the canonical face then flips that only half of face, leading to a weird result (just a guess, can't make it exact). In this case, what should I do to make the model output a better and a correct result? Maybe relevant to the quality of dataset? Or maybe relevant to the parameters of the dataloader, like 'image_size' or 'crop'? Really hope to hear your points and advice on my problem, thank you! @.***我在复现这篇论文时也出现了重建出的人脸图像只有一半的情况,请问你解决这个问题了吗?期待你的回复,谢谢! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

好的,感谢您!