Open l4rz opened 2 years ago
Hi l4rz,
We're not planning on releasing the pose-extraction code as part of this repo but you can use Deep3DFaceRecon_pytorch to crop the images and extract camera parameters. If you email me directly, I can give you the modifications you'll need if you want to export EG3D dataset files.
Eric
Hi l4rz,
We're not planning on releasing the pose-extraction code as part of this repo but you can use Deep3DFaceRecon_pytorch to crop the images and extract camera parameters. If you email me directly, I can give you the modifications you'll need if you want to export EG3D dataset files.
Eric
Hi, Eric,
I follow this link and get angles and translation using deep3dfacerecon_pytorch, I compute rotation matrix using the following function
def compute_rotation(angles):
x, y, z = angles
rot_x = np.array([
[1, 0, 0],
[0, np.cos(x), -np.sin(x)],
[0, np.sin(x), np.cos(x)]
])
rot_y = np.array([
[np.cos(y), 0, np.sin(y)],
[0, 1, 0],
[-np.sin(y), 0, np.cos(y)]
])
rot_z = np.array([
[np.cos(z), -np.sin(z), 0],
[np.sin(z), np.cos(z), 0],
[0, 0, 1]
])
return np.matmul(rot_z, np.matmul(rot_y, rot_x))
the rotation matrices I get are different from the rotation matrices you provide in the dataset.json. May I know the angle to rotation matrices function you use when you process the data? It seems that I need to multiply the second and third row with -1 to get the same results.
In summary, I wonder how to get the 4x4 extrinsics from angles and trans estimated with deep3dfacerecon_pytorch.
Hi Eric,
I also have questions about the pose extraction part, it seems the origin of deep3dfacerecon_pytorch's coordinates is different from that of eg3d. Can you provide the translation vector of it? I am not very sure if they only have difference in the origin, maybe also on the axis orientation. If this is true, the whole 4*4 transformation matrix is needed.
Thank you!
Hi Eric,
I also have questions about the pose extraction part, it seems the origin of deep3dfacerecon_pytorch's coordinates is different from that of eg3d. Can you provide the translation vector of it? I am not very sure if they only have difference in the origin, maybe also on the axis orientation. If this is true, the whole 4*4 transformation matrix is needed.
Thank you!
Hi, I get similar results using the following code. I am not sure it's correct but the 4x4 extrinsics looks similar to the extrinsics in the provided dataset.json file. It seems that the camera distance is assumed fixed at 2.7. You can have a try. I hope the code helps.
import numpy as np
def compute_rotation(angles):
x, y, z = angles
rot_x = np.array([
[1, 0, 0],
[0, np.cos(x), -np.sin(x)],
[0, np.sin(x), np.cos(x)]
])
rot_y = np.array([
[np.cos(y), 0, np.sin(y)],
[0, 1, 0],
[-np.sin(y), 0, np.cos(y)]
])
rot_z = np.array([
[np.cos(z), -np.sin(z), 0],
[np.sin(z), np.cos(z), 0],
[0, 0, 1]
])
return np.matmul(rot_z, np.matmul(rot_y, rot_x))
'''
euler angles and translation are estimated from deep3dfacerecon_pytorch
'''
def get_extrinsics_from_euler_and_translation(euler:np.ndarray, trans:np.ndarray):
theta_x, theta_y, theta_z = euler[0], euler[1], euler[2]
theta_x = np.pi - theta_x
theta_y = -theta_y
theta_z = theta_z
rot_mat = compute_rotation([theta_x, theta_y, theta_z])
trans_x = -trans[0]
trans_y = trans[1]
trans_z = np.sqrt(2.7 ** 2 - trans_x ** 2 - trans_y ** 2)
trans_new = np.matmul(rot_mat, np.array([trans_x, trans_y, trans_z]))
mat_4x4 = np.eye(4)
mat_4x4[0:3, 0:3] = rot_mat
mat_4x4[0:3, 3] = -trans_new
return mat_4x4
if __name__ == "__main__":
test_fname = '00039.png'
euler_estimated = np.array([ 0.15566148, -0.18466546,-0.01471091])
trans_estimated = np.array([0.03106074, 0.02740563, 0.07686124])
mat_4x4 = get_extrinsics_from_euler_and_translation(euler_estimated, trans_estimated)
print(mat_4x4)
Hi, @X-niper
I also notice this similar issue, I found the rotation matrix directly obtained from compute_rotation
is different from the provided extrinsics in dataset.json
. Could you please tell the convention differences between the DeepFace3DRecon_pytorch and EG3D, and explain a bit more about how you fill in this gap from your code?
I am a newbee just starting in 3D vision. Thank in advance.
Hi,@bd20222
We can get the Euler angles from the extrinsics in the dataset.json. I compared the derived Euler angles with the one estimated by DeepFace3DRecon_pytorch and found the differences. I found that the camera distance is fixed at 2.7 when I read the generate_sample.py code and the guess is validated with the translation got from extrinsics in the dataset.json.
I attach the code below, with which you can get Euler angles from rotation matrix.
def compute_angle_from_matrix(matrix3x3):
M = matrix3x3
theta_y = np.arcsin(-M[2,0])
theta_z = np.arctan2(M[1,0], M[0,0])
theta_x = np.arctan2(M[2,1], M[2,2])
return (theta_x, theta_y, theta_z)
@X-niper
Thx, I understand the general steps, then can you expain what's the meaning of
theta_x = np.pi - theta_x
theta_y = -theta_y
theta_z = theta_z
Maybe through an intuitive example? I guess it has something to do with the order and orientation of the axes.
I also found the implementation of class LookAtPoseSampler
in camera_utils.py
is similar to your degeree transfomation.
class LookAtPoseSampler:
"""
Same as GaussianCameraPoseSampler, except the
camera is specified as looking at 'lookat_position', a 3-vector.
Example:
For a camera pose looking at the origin with the camera at position [0, 0, 1]:
cam2world = LookAtPoseSampler.sample(math.pi/2, math.pi/2, torch.tensor([0, 0, 0]), radius=1)
"""
@staticmethod
def sample(horizontal_mean, vertical_mean, lookat_position, horizontal_stddev=0, vertical_stddev=0, radius=1, batch_size=1, device='cpu'):
h = torch.randn((batch_size, 1), device=device) * horizontal_stddev + horizontal_mean
v = torch.randn((batch_size, 1), device=device) * vertical_stddev + vertical_mean
v = torch.clamp(v, 1e-5, math.pi - 1e-5)
theta = h
v = v / math.pi
phi = torch.arccos(1 - 2*v)
camera_origins = torch.zeros((batch_size, 3), device=device)
camera_origins[:, 0:1] = radius*torch.sin(phi) * torch.cos(math.pi-theta)
camera_origins[:, 2:3] = radius*torch.sin(phi) * torch.sin(math.pi-theta)
camera_origins[:, 1:2] = radius*torch.cos(phi)
# forward_vectors = math_utils.normalize_vecs(-camera_origins)
forward_vectors = math_utils.normalize_vecs(lookat_position - camera_origins)
return create_cam2world_matrix(forward_vectors, camera_origins)
Hi Eric, I also have questions about the pose extraction part, it seems the origin of deep3dfacerecon_pytorch's coordinates is different from that of eg3d. Can you provide the translation vector of it? I am not very sure if they only have difference in the origin, maybe also on the axis orientation. If this is true, the whole 4*4 transformation matrix is needed. Thank you!
Hi, I get similar results using the following code. I am not sure it's correct but the 4x4 extrinsics looks similar to the extrinsics in the provided dataset.json file. It seems that the camera distance is assumed fixed at 2.7. You can have a try. I hope the code helps.
import numpy as np def compute_rotation(angles): x, y, z = angles rot_x = np.array([ [1, 0, 0], [0, np.cos(x), -np.sin(x)], [0, np.sin(x), np.cos(x)] ]) rot_y = np.array([ [np.cos(y), 0, np.sin(y)], [0, 1, 0], [-np.sin(y), 0, np.cos(y)] ]) rot_z = np.array([ [np.cos(z), -np.sin(z), 0], [np.sin(z), np.cos(z), 0], [0, 0, 1] ]) return np.matmul(rot_z, np.matmul(rot_y, rot_x)) ''' euler angles and translation are estimated from deep3dfacerecon_pytorch ''' def get_extrinsics_from_euler_and_translation(euler:np.ndarray, trans:np.ndarray): theta_x, theta_y, theta_z = euler[0], euler[1], euler[2] theta_x = np.pi - theta_x theta_y = -theta_y theta_z = theta_z rot_mat = compute_rotation([theta_x, theta_y, theta_z]) trans_x = -trans[0] trans_y = trans[1] trans_z = np.sqrt(2.7 ** 2 - trans_x ** 2 - trans_y ** 2) trans_new = np.matmul(rot_mat, np.array([trans_x, trans_y, trans_z])) mat_4x4 = np.eye(4) mat_4x4[0:3, 0:3] = rot_mat mat_4x4[0:3, 3] = -trans_new return mat_4x4 if __name__ == "__main__": test_fname = '00039.png' euler_estimated = np.array([ 0.15566148, -0.18466546,-0.01471091]) trans_estimated = np.array([0.03106074, 0.02740563, 0.07686124]) mat_4x4 = get_extrinsics_from_euler_and_translation(euler_estimated, trans_estimated) print(mat_4x4)
Hi@X-niper,
I follow Deep3DFace and extract the euler and trans coeff, but the result of 00039 I got is different from your's in code. I wonder how did you get the euler and the trans?
I use a facial landmarks dector to extract landmark, and follows the instruction in https://github.com/sicxu/Deep3DFaceRecon_pytorch#test-with-custom-images.
The facerecon network will return the output_coeff output_coeff = self.net_recon(self.input_img)
, so we can get euler and trans.
Is there anything wrong with my process?
@41xu Hi, I use the original ffhq cropped image. I use mtcnn to detect facial landmarks.
Hi, @ericryanchan Thanks for your great work! I'm recently doing inversion on your framework and get rather good results on FFHQ dataset with the cropping method and camera pose data according to runme.py. Is it possible to share the code cropping and calculate the camera pose on images in the wild to me through email?
Also want to know how to extract the camera pose of in the wild images. Please let me know if you have any idea, thank you guys. Use the exact output coeffs of Deep3DRecon directly seems not right
Hi @ericryanchan and @X-niper, thank you for the great work and comments :) I read all of comments in this thread, and have various experiments but I failed to find the solution.
My goal is generating dataset.json (which contains extrinsic matrix) in any images. Below steps are what I've done so far.
Step 1. Get coeffs from Deep3DRecon (also used mtcnn to detect the face).
# By using facerecon_model.py we can get the coeff
output_coeff = self.net_recon(self.input_img)
result = self.facemodel.split_coeff(output_coeff)
# get euler angle: result['angle']
# get translation info: result['trans']
Step 2. get extrinsic matrix using the @X-niper 's code.
mat_4x4 = get_extrinsics_from_euler_and_translation(result['angle'], result['trans'])
Step 3. compare my mat_4x4 result with the ground-truth extrinsic matrix in EG3D's dataset.json. I followed EG3D FFHQ preprocessing and use the image which is exactly the same as what EG3D used.
Unfortunately, the extrinsic matrices are similar but different from each other. And accordingly, the rendered image using my extrinsics is also different from the original one. I used 00001.png image.
# my estimated 4x4 extrinsic matrix:
[[ 0.98808561 -0.03921765 0.14882481 -0.40602439]
[-0.00870278 -0.97967942 -0.20038081 0.550637 ]
[ 0.15365908 0.19669821 -0.96834847 2.61188504]
[ 0. 0. 0. 1. ]]
# euler angle: (2.941191690164157, -0.1542702688887573, -0.008807490044743849)
# ground-truth 4x4 extrinsic matrix from dataset.json(EG3D):
[[ 0.98884588 0.01061465 0.14856379 -0.37894215]
[ 0.04037552 -0.97921258 -0.19877771 0.5142959 ]
[ 0.14336558 0.20255886 -0.96871883 2.62333806]
[ 0. 0. 0. 1. ]]
# euler angle: (2.9354628454139498, -0.1438612908670918, 0.04080828469625245)
left one: render inversion result using ground-truth extrinsic parameters from dataset.json(EG3D) right one: render inversion result using my estimated extrinsic parameters.
Could you give me any advice or camera matrix extraction codes?
Thank you in advance and have a great day!
# euler angle: (2.9354628454139498, -0.1438612908670918, 0.04080828469625245)
same
Same here. While doing inversion on celebA-HQ, I used @X-niper 's code to get extrinsic, and fed to the generator.
left : GT middle : using X-nifer's code last : using frontal face GT extrinsic from dataset.json (I picked it randomly) I used lookat extrinsic as visualization. It is strange that GT extrinsic from dataset.json works well (same for other samples as well)
@ericryanchan , Could you please share the code calculating 4x4 extrinsic from euler angle and translation?
Hi, @blandocs @41xu @mlnyang
I am sorry to say that my guess of the get_extrinsics_from_euler_and_translation function is not exactly the same as the way that Eric uses in this work. Please email Eric to get the code for this step. What I want to say is that we can never derive the same process without the code shared by Eric. The code is in Eric's private repo, so I can't copy it here....
Besides, please follow the runme.py script in this repo to crop your face (you can see that there is one function imported from Deep3DReconPytorch) data before feeding it to estimate poses, then the poses could be extracted correctly.
Best of luck
Hi @ericryanchan and @X-niper, thank you for the great work and comments :) I read all of comments in this thread, and have various experiments but I failed to find the solution.
My goal is generating dataset.json (which contains extrinsic matrix) in any images. Below steps are what I've done so far.
Step 1. Get coeffs from Deep3DRecon (also used mtcnn to detect the face).
# By using facerecon_model.py we can get the coeff output_coeff = self.net_recon(self.input_img) result = self.facemodel.split_coeff(output_coeff) # get euler angle: result['angle'] # get translation info: result['trans']
Step 2. get extrinsic matrix using the @X-niper 's code.
mat_4x4 = get_extrinsics_from_euler_and_translation(result['angle'], result['trans'])
Step 3. compare my mat_4x4 result with the ground-truth extrinsic matrix in EG3D's dataset.json. I followed EG3D FFHQ preprocessing and use the image which is exactly the same as what EG3D used.
Unfortunately, the extrinsic matrices are similar but different from each other. And accordingly, the rendered image using my extrinsics is also different from the original one. I used 00001.png image.
# my estimated 4x4 extrinsic matrix: [[ 0.98808561 -0.03921765 0.14882481 -0.40602439] [-0.00870278 -0.97967942 -0.20038081 0.550637 ] [ 0.15365908 0.19669821 -0.96834847 2.61188504] [ 0. 0. 0. 1. ]] # euler angle: (2.941191690164157, -0.1542702688887573, -0.008807490044743849) # ground-truth 4x4 extrinsic matrix from dataset.json(EG3D): [[ 0.98884588 0.01061465 0.14856379 -0.37894215] [ 0.04037552 -0.97921258 -0.19877771 0.5142959 ] [ 0.14336558 0.20255886 -0.96871883 2.62333806] [ 0. 0. 0. 1. ]] # euler angle: (2.9354628454139498, -0.1438612908670918, 0.04080828469625245)
left one: render inversion result using ground-truth extrinsic parameters from dataset.json(EG3D) right one: render inversion result using my estimated extrinsic parameters.
Could you give me any advice or camera matrix extraction codes?
Thank you in advance and have a great day!
Hi @blandocs, The matrix I got from dataset.json for the image is as follows. That one is different than your GT. Where did you get this file? Mine is from here. https://github.com/NVlabs/eg3d/blob/0b38adcc2bed6b4fda922efd6ec747e1216dc1fd/dataset_preprocessing/ffhq/runme.py#L70 ['00001.png', [0.9950790405273438, 0.016893357038497925, -0.09763375669717789, 0.24722558007395418, 0.00013333745300769806, -0.9855860471725464, -0.16917484998703003, 0.45264816608040714, -0.09908440709114075, 0.16832932829856873, -0.9807382822036743, 2.6502809568612045, 0.0, 0.0, 0.0, 1.0, 4.2647, 0.0, 0.5, 0.0, 4.2647, 0.5, 0.0, 0.0, 1.0]]
Hi all, I just pushed additional scripts that can crop and extract poses from in-the-wild portrait images in a way that is compatible with the FFHQ checkpoints. Hope that is useful. https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py
so if I am using object other than faces, can I still use the Deep3DFaceRecon_pytorch to generate the camera matrix? How am I going to get a camera matrix for it? Thanks.
Hi all, I just pushed additional scripts that can crop and extract poses from in-the-wild portrait images in a way that is compatible with the FFHQ checkpoints. Hope that is useful. https://github.com/NVlabs/eg3d/blob/main/dataset_preprocessing/ffhq/preprocess_in_the_wild.py
Thank you for the code. I tried the preprocess_in_the_wild but im getting weird random crops from all over the images:
the Deep3DFaceRecon works fine as their 3D face prediction results are close to the reported paper.
@usmancheema89 can you send me a few of these original pictures?
Sure :) please find attached the zip file.
@bernakabadayi
Sorry for the confusion, I used 00002.png camera parameters. But, I'm not sure this confusion is the problem.
After I use the code from the author, I can totally resolve it :)
@usmancheema89 it seems most likely that the landmarks from MTCNN are failing for the incorrectly aligned examples. Below is some debugging images. The ones where the landmarks look correct appear to be correct. . The landmark detector doesn't have to be MTCNN if you find something else that is more robust. Any 68 or 5 point-based one should work.
@luminohope Thanks for figuring out the issue. I assumed the landmark points detected by the "Deep3DFaceRecon" were being used downstream, they seemed to be working fine which confused me. I have tested insightface's landmark detection and that seems to be working. I will test that out and update :)
@usmancheema89 it seems most likely that the landmarks from MTCNN are failing for the incorrectly aligned examples. Below is some debugging images. The ones where the landmarks look correct appear to be correct. . The landmark detector doesn't have to be MTCNN if you find something else that is more robust. Any 68 or 5 point-based one should work.
Hi @usmancheema89, I have the same problem as you, lots of images were incorrectly aligned. Did you find a solution for this ? Thanks a lot !!
@belgacemi I didn't further explore the issue as I got into some other work. But as Luminohope suggested, you may change the face lmk detection network for improved results
"We use an off-the-shelf face detection and pose-extraction pipeline to both identify the face region and label the image with a pose"
Would it be possible to add the code to generate camera pose labels ("cameras.json") to this repo?