NVlabs / eg3d

Other
3.23k stars 361 forks source link

FFHQ dataset camera pose labeling #18

Open l4rz opened 2 years ago

l4rz commented 2 years ago

"We use an off-the-shelf face detection and pose-extraction pipeline to both identify the face region and label the image with a pose"

Would it be possible to add the code to generate camera pose labels ("cameras.json") to this repo?

ericryanchan commented 2 years ago

Hi l4rz,

We're not planning on releasing the pose-extraction code as part of this repo but you can use Deep3DFaceRecon_pytorch to crop the images and extract camera parameters. If you email me directly, I can give you the modifications you'll need if you want to export EG3D dataset files.

Eric

X-niper commented 2 years ago

Hi l4rz,

We're not planning on releasing the pose-extraction code as part of this repo but you can use Deep3DFaceRecon_pytorch to crop the images and extract camera parameters. If you email me directly, I can give you the modifications you'll need if you want to export EG3D dataset files.

Eric

Hi, Eric,

I follow this link and get angles and translation using deep3dfacerecon_pytorch, I compute rotation matrix using the following function

def compute_rotation(angles):
    x, y, z = angles 
    rot_x = np.array([
        [1, 0, 0],
        [0, np.cos(x), -np.sin(x)],
        [0, np.sin(x), np.cos(x)]
    ])
    rot_y = np.array([
        [np.cos(y), 0, np.sin(y)],
        [0, 1, 0],
        [-np.sin(y), 0, np.cos(y)]
    ])
    rot_z = np.array([
        [np.cos(z), -np.sin(z), 0],
        [np.sin(z), np.cos(z), 0],
        [0, 0, 1]
    ])
    return np.matmul(rot_z, np.matmul(rot_y, rot_x))

the rotation matrices I get are different from the rotation matrices you provide in the dataset.json. May I know the angle to rotation matrices function you use when you process the data? It seems that I need to multiply the second and third row with -1 to get the same results.

In summary, I wonder how to get the 4x4 extrinsics from angles and trans estimated with deep3dfacerecon_pytorch.

jiaxinxie97 commented 2 years ago

Hi Eric,

I also have questions about the pose extraction part, it seems the origin of deep3dfacerecon_pytorch's coordinates is different from that of eg3d. Can you provide the translation vector of it? I am not very sure if they only have difference in the origin, maybe also on the axis orientation. If this is true, the whole 4*4 transformation matrix is needed.

Thank you!

X-niper commented 2 years ago

Hi Eric,

I also have questions about the pose extraction part, it seems the origin of deep3dfacerecon_pytorch's coordinates is different from that of eg3d. Can you provide the translation vector of it? I am not very sure if they only have difference in the origin, maybe also on the axis orientation. If this is true, the whole 4*4 transformation matrix is needed.

Thank you!

Hi, I get similar results using the following code. I am not sure it's correct but the 4x4 extrinsics looks similar to the extrinsics in the provided dataset.json file. It seems that the camera distance is assumed fixed at 2.7. You can have a try. I hope the code helps.

import numpy as np 

def compute_rotation(angles):
    x, y, z = angles
    rot_x = np.array([
        [1, 0, 0],
        [0, np.cos(x), -np.sin(x)],
        [0, np.sin(x), np.cos(x)]
    ])
    rot_y = np.array([
        [np.cos(y), 0, np.sin(y)],
        [0, 1, 0],
        [-np.sin(y), 0, np.cos(y)]
    ])
    rot_z = np.array([
        [np.cos(z), -np.sin(z), 0],
        [np.sin(z), np.cos(z), 0],
        [0, 0, 1]
    ])
    return np.matmul(rot_z, np.matmul(rot_y, rot_x))

'''
euler angles and translation are estimated from deep3dfacerecon_pytorch
'''
def get_extrinsics_from_euler_and_translation(euler:np.ndarray, trans:np.ndarray):
    theta_x, theta_y, theta_z = euler[0], euler[1], euler[2]
    theta_x = np.pi - theta_x
    theta_y = -theta_y
    theta_z = theta_z 
    rot_mat = compute_rotation([theta_x, theta_y, theta_z])
    trans_x = -trans[0]
    trans_y = trans[1]
    trans_z = np.sqrt(2.7 ** 2 - trans_x ** 2 - trans_y ** 2)
    trans_new = np.matmul(rot_mat, np.array([trans_x, trans_y, trans_z]))
    mat_4x4 = np.eye(4)
    mat_4x4[0:3, 0:3] = rot_mat
    mat_4x4[0:3, 3] = -trans_new
    return mat_4x4

if __name__ == "__main__":
    test_fname = '00039.png'
    euler_estimated = np.array([ 0.15566148, -0.18466546,-0.01471091])
    trans_estimated = np.array([0.03106074, 0.02740563, 0.07686124])
    mat_4x4 = get_extrinsics_from_euler_and_translation(euler_estimated, trans_estimated)
    print(mat_4x4)
e4s2022 commented 2 years ago

Hi, @X-niper

I also notice this similar issue, I found the rotation matrix directly obtained from compute_rotation is different from the provided extrinsics in dataset.json. Could you please tell the convention differences between the DeepFace3DRecon_pytorch and EG3D, and explain a bit more about how you fill in this gap from your code?

I am a newbee just starting in 3D vision. Thank in advance.

X-niper commented 2 years ago

Hi,@bd20222

We can get the Euler angles from the extrinsics in the dataset.json. I compared the derived Euler angles with the one estimated by DeepFace3DRecon_pytorch and found the differences. I found that the camera distance is fixed at 2.7 when I read the generate_sample.py code and the guess is validated with the translation got from extrinsics in the dataset.json.

I attach the code below, with which you can get Euler angles from rotation matrix.

def compute_angle_from_matrix(matrix3x3):
    M = matrix3x3
    theta_y = np.arcsin(-M[2,0])
    theta_z = np.arctan2(M[1,0], M[0,0])
    theta_x = np.arctan2(M[2,1], M[2,2])
    return (theta_x, theta_y, theta_z)
e4s2022 commented 2 years ago

@X-niper

Thx, I understand the general steps, then can you expain what's the meaning of

theta_x = np.pi - theta_x
theta_y = -theta_y
theta_z = theta_z 

Maybe through an intuitive example? I guess it has something to do with the order and orientation of the axes.

I also found the implementation of class LookAtPoseSampler in camera_utils.py is similar to your degeree transfomation.

class LookAtPoseSampler:
    """
    Same as GaussianCameraPoseSampler, except the
    camera is specified as looking at 'lookat_position', a 3-vector.

    Example:
    For a camera pose looking at the origin with the camera at position [0, 0, 1]:
    cam2world = LookAtPoseSampler.sample(math.pi/2, math.pi/2, torch.tensor([0, 0, 0]), radius=1)
    """

    @staticmethod
    def sample(horizontal_mean, vertical_mean, lookat_position, horizontal_stddev=0, vertical_stddev=0, radius=1, batch_size=1, device='cpu'):
        h = torch.randn((batch_size, 1), device=device) * horizontal_stddev + horizontal_mean
        v = torch.randn((batch_size, 1), device=device) * vertical_stddev + vertical_mean
        v = torch.clamp(v, 1e-5, math.pi - 1e-5)

        theta = h 
        v = v / math.pi
        phi = torch.arccos(1 - 2*v)  

        camera_origins = torch.zeros((batch_size, 3), device=device)

        camera_origins[:, 0:1] = radius*torch.sin(phi) * torch.cos(math.pi-theta)
        camera_origins[:, 2:3] = radius*torch.sin(phi) * torch.sin(math.pi-theta)
        camera_origins[:, 1:2] = radius*torch.cos(phi)

        # forward_vectors = math_utils.normalize_vecs(-camera_origins)
        forward_vectors = math_utils.normalize_vecs(lookat_position - camera_origins) 
        return create_cam2world_matrix(forward_vectors, camera_origins)
41xu commented 2 years ago

Hi Eric, I also have questions about the pose extraction part, it seems the origin of deep3dfacerecon_pytorch's coordinates is different from that of eg3d. Can you provide the translation vector of it? I am not very sure if they only have difference in the origin, maybe also on the axis orientation. If this is true, the whole 4*4 transformation matrix is needed. Thank you!

Hi, I get similar results using the following code. I am not sure it's correct but the 4x4 extrinsics looks similar to the extrinsics in the provided dataset.json file. It seems that the camera distance is assumed fixed at 2.7. You can have a try. I hope the code helps.

import numpy as np 

def compute_rotation(angles):
    x, y, z = angles
    rot_x = np.array([
        [1, 0, 0],
        [0, np.cos(x), -np.sin(x)],
        [0, np.sin(x), np.cos(x)]
    ])
    rot_y = np.array([
        [np.cos(y), 0, np.sin(y)],
        [0, 1, 0],
        [-np.sin(y), 0, np.cos(y)]
    ])
    rot_z = np.array([
        [np.cos(z), -np.sin(z), 0],
        [np.sin(z), np.cos(z), 0],
        [0, 0, 1]
    ])
    return np.matmul(rot_z, np.matmul(rot_y, rot_x))

'''
euler angles and translation are estimated from deep3dfacerecon_pytorch
'''
def get_extrinsics_from_euler_and_translation(euler:np.ndarray, trans:np.ndarray):
    theta_x, theta_y, theta_z = euler[0], euler[1], euler[2]
    theta_x = np.pi - theta_x
    theta_y = -theta_y
    theta_z = theta_z 
    rot_mat = compute_rotation([theta_x, theta_y, theta_z])
    trans_x = -trans[0]
    trans_y = trans[1]
    trans_z = np.sqrt(2.7 ** 2 - trans_x ** 2 - trans_y ** 2)
    trans_new = np.matmul(rot_mat, np.array([trans_x, trans_y, trans_z]))
    mat_4x4 = np.eye(4)
    mat_4x4[0:3, 0:3] = rot_mat
    mat_4x4[0:3, 3] = -trans_new
    return mat_4x4

if __name__ == "__main__":
    test_fname = '00039.png'
    euler_estimated = np.array([ 0.15566148, -0.18466546,-0.01471091])
    trans_estimated = np.array([0.03106074, 0.02740563, 0.07686124])
    mat_4x4 = get_extrinsics_from_euler_and_translation(euler_estimated, trans_estimated)
    print(mat_4x4)

Hi@X-niper,

I follow Deep3DFace and extract the euler and trans coeff, but the result of 00039 I got is different from your's in code. I wonder how did you get the euler and the trans? I use a facial landmarks dector to extract landmark, and follows the instruction in https://github.com/sicxu/Deep3DFaceRecon_pytorch#test-with-custom-images. The facerecon network will return the output_coeff output_coeff = self.net_recon(self.input_img) , so we can get euler and trans. Is there anything wrong with my process?

X-niper commented 2 years ago

@41xu Hi, I use the original ffhq cropped image. I use mtcnn to detect facial landmarks.

lyx0208 commented 2 years ago

Hi, @ericryanchan Thanks for your great work! I'm recently doing inversion on your framework and get rather good results on FFHQ dataset with the cropping method and camera pose data according to runme.py. Is it possible to share the code cropping and calculate the camera pose on images in the wild to me through email?

e4s2022 commented 2 years ago

Also want to know how to extract the camera pose of in the wild images. Please let me know if you have any idea, thank you guys. Use the exact output coeffs of Deep3DRecon directly seems not right

blandocs commented 2 years ago

Hi @ericryanchan and @X-niper, thank you for the great work and comments :) I read all of comments in this thread, and have various experiments but I failed to find the solution.

My goal is generating dataset.json (which contains extrinsic matrix) in any images. Below steps are what I've done so far.

Step 1. Get coeffs from Deep3DRecon (also used mtcnn to detect the face).

# By using facerecon_model.py we can get the coeff
output_coeff = self.net_recon(self.input_img)
result = self.facemodel.split_coeff(output_coeff)

# get euler angle: result['angle']
# get translation info: result['trans']

Step 2. get extrinsic matrix using the @X-niper 's code.

mat_4x4 = get_extrinsics_from_euler_and_translation(result['angle'], result['trans'])

Step 3. compare my mat_4x4 result with the ground-truth extrinsic matrix in EG3D's dataset.json. I followed EG3D FFHQ preprocessing and use the image which is exactly the same as what EG3D used.

Unfortunately, the extrinsic matrices are similar but different from each other. And accordingly, the rendered image using my extrinsics is also different from the original one. I used 00001.png image.

# my estimated 4x4 extrinsic matrix: 
[[ 0.98808561 -0.03921765  0.14882481 -0.40602439]
 [-0.00870278 -0.97967942 -0.20038081  0.550637  ]
 [ 0.15365908  0.19669821 -0.96834847  2.61188504]
 [ 0.          0.          0.          1.        ]]
# euler angle: (2.941191690164157, -0.1542702688887573, -0.008807490044743849)

# ground-truth 4x4 extrinsic matrix from dataset.json(EG3D): 
[[ 0.98884588  0.01061465  0.14856379 -0.37894215]
 [ 0.04037552 -0.97921258 -0.19877771  0.5142959 ]
 [ 0.14336558  0.20255886 -0.96871883  2.62333806]
 [ 0.          0.          0.          1.        ]]
# euler angle: (2.9354628454139498, -0.1438612908670918, 0.04080828469625245)

left one: render inversion result using ground-truth extrinsic parameters from dataset.json(EG3D) right one: render inversion result using my estimated extrinsic parameters. image

Could you give me any advice or camera matrix extraction codes?

Thank you in advance and have a great day!

41xu commented 2 years ago
# euler angle: (2.9354628454139498, -0.1438612908670918, 0.04080828469625245)

same

mlnyang commented 2 years ago

Same here. While doing inversion on celebA-HQ, I used @X-niper 's code to get extrinsic, and fed to the generator.

image left : GT middle : using X-nifer's code last : using frontal face GT extrinsic from dataset.json (I picked it randomly) I used lookat extrinsic as visualization. It is strange that GT extrinsic from dataset.json works well (same for other samples as well)

@ericryanchan , Could you please share the code calculating 4x4 extrinsic from euler angle and translation?

X-niper commented 2 years ago

Hi, @blandocs @41xu @mlnyang

I am sorry to say that my guess of the get_extrinsics_from_euler_and_translation function is not exactly the same as the way that Eric uses in this work. Please email Eric to get the code for this step. What I want to say is that we can never derive the same process without the code shared by Eric. The code is in Eric's private repo, so I can't copy it here....

Besides, please follow the runme.py script in this repo to crop your face (you can see that there is one function imported from Deep3DReconPytorch) data before feeding it to estimate poses, then the poses could be extracted correctly.

Best of luck

bernakabadayi commented 2 years ago

Hi @ericryanchan and @X-niper, thank you for the great work and comments :) I read all of comments in this thread, and have various experiments but I failed to find the solution.

My goal is generating dataset.json (which contains extrinsic matrix) in any images. Below steps are what I've done so far.

Step 1. Get coeffs from Deep3DRecon (also used mtcnn to detect the face).

# By using facerecon_model.py we can get the coeff
output_coeff = self.net_recon(self.input_img)
result = self.facemodel.split_coeff(output_coeff)

# get euler angle: result['angle']
# get translation info: result['trans']

Step 2. get extrinsic matrix using the @X-niper 's code.

mat_4x4 = get_extrinsics_from_euler_and_translation(result['angle'], result['trans'])

Step 3. compare my mat_4x4 result with the ground-truth extrinsic matrix in EG3D's dataset.json. I followed EG3D FFHQ preprocessing and use the image which is exactly the same as what EG3D used.

Unfortunately, the extrinsic matrices are similar but different from each other. And accordingly, the rendered image using my extrinsics is also different from the original one. I used 00001.png image.

# my estimated 4x4 extrinsic matrix: 
[[ 0.98808561 -0.03921765  0.14882481 -0.40602439]
 [-0.00870278 -0.97967942 -0.20038081  0.550637  ]
 [ 0.15365908  0.19669821 -0.96834847  2.61188504]
 [ 0.          0.          0.          1.        ]]
# euler angle: (2.941191690164157, -0.1542702688887573, -0.008807490044743849)

# ground-truth 4x4 extrinsic matrix from dataset.json(EG3D): 
[[ 0.98884588  0.01061465  0.14856379 -0.37894215]
 [ 0.04037552 -0.97921258 -0.19877771  0.5142959 ]
 [ 0.14336558  0.20255886 -0.96871883  2.62333806]
 [ 0.          0.          0.          1.        ]]
# euler angle: (2.9354628454139498, -0.1438612908670918, 0.04080828469625245)

left one: render inversion result using ground-truth extrinsic parameters from dataset.json(EG3D) right one: render inversion result using my estimated extrinsic parameters. image

Could you give me any advice or camera matrix extraction codes?

Thank you in advance and have a great day!

Hi @blandocs, The matrix I got from dataset.json for the image is as follows. That one is different than your GT. Where did you get this file? Mine is from here. https://github.com/NVlabs/eg3d/blob/0b38adcc2bed6b4fda922efd6ec747e1216dc1fd/dataset_preprocessing/ffhq/runme.py#L70 ['00001.png', [0.9950790405273438, 0.016893357038497925, -0.09763375669717789, 0.24722558007395418, 0.00013333745300769806, -0.9855860471725464, -0.16917484998703003, 0.45264816608040714, -0.09908440709114075, 0.16832932829856873, -0.9807382822036743, 2.6502809568612045, 0.0, 0.0, 0.0, 1.0, 4.2647, 0.0, 0.5, 0.0, 4.2647, 0.5, 0.0, 0.0, 1.0]]

luminohope commented 2 years ago

Hi all, I just pushed additional scripts that can crop and extract poses from in-the-wild portrait images in a way that is compatible with the FFHQ checkpoints. Hope that is useful. https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py

BenjiKCF commented 2 years ago

so if I am using object other than faces, can I still use the Deep3DFaceRecon_pytorch to generate the camera matrix? How am I going to get a camera matrix for it? Thanks.

usmancheema89 commented 2 years ago

Hi all, I just pushed additional scripts that can crop and extract poses from in-the-wild portrait images in a way that is compatible with the FFHQ checkpoints. Hope that is useful. https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py

Thank you for the code. I tried the preprocess_in_the_wild but im getting weird random crops from all over the images: Screenshot from 2022-08-11 17-48-50

the Deep3DFaceRecon works fine as their 3D face prediction results are close to the reported paper. Screenshot from 2022-08-11 17-51-10

luminohope commented 2 years ago

@usmancheema89 can you send me a few of these original pictures?

usmancheema89 commented 2 years ago

Sure :) please find attached the zip file.

sample Images.zip

blandocs commented 2 years ago

@bernakabadayi

Sorry for the confusion, I used 00002.png camera parameters. But, I'm not sure this confusion is the problem.

After I use the code from the author, I can totally resolve it :)

luminohope commented 2 years ago

@usmancheema89 it seems most likely that the landmarks from MTCNN are failing for the incorrectly aligned examples. Below is some debugging images. The ones where the landmarks look correct appear to be correct. image. The landmark detector doesn't have to be MTCNN if you find something else that is more robust. Any 68 or 5 point-based one should work.

usmancheema89 commented 2 years ago

@luminohope Thanks for figuring out the issue. I assumed the landmark points detected by the "Deep3DFaceRecon" were being used downstream, they seemed to be working fine which confused me. I have tested insightface's landmark detection and that seems to be working. I will test that out and update :)

ghost commented 2 years ago

@usmancheema89 it seems most likely that the landmarks from MTCNN are failing for the incorrectly aligned examples. Below is some debugging images. The ones where the landmarks look correct appear to be correct. image. The landmark detector doesn't have to be MTCNN if you find something else that is more robust. Any 68 or 5 point-based one should work.

Hi @usmancheema89, I have the same problem as you, lots of images were incorrectly aligned. Did you find a solution for this ? Thanks a lot !!

usmancheema89 commented 2 years ago

@belgacemi I didn't further explore the issue as I got into some other work. But as Luminohope suggested, you may change the face lmk detection network for improved results