FeiiYin / SPI

[CVPR 2023] SPI: 3D GAN Inversion with Facial Symmetry Prior
118 stars 8 forks source link

How are the camera parameters extracted from the input images? #3

Open LPHFAQ opened 7 months ago

LPHFAQ commented 7 months ago

Hi, when I ran this project I found it's cool that the camera parameters (including poses, angles, intrinsic) can be extracted and saved at the path "./test/dataset/c/". But I'm confused about the method of extracting camera parameters. I referred to your paper and found this part about camera parameters: image Then I referred to the paper [9] "Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set", there are no descriptions about extracting camera pose and intrinsic. So could you explain how the camera parameters are extracted from input face images? (PS: I referred to the project codes but failed to understand how the camera poses and intrinsic are extracted, but I guess that camera parameters might be calculated from the 3dmm coefficient -- the pose of face? )

FeiiYin commented 7 months ago

Yes, camera parameters are calculated from the 3dmm coefficient. 3DMM provide the 3 rotation degrees in the pose parameters. You can refer the original repo or EG3D repo for the details.

LPHFAQ commented 7 months ago

Yes, camera parameters are calculated from the 3dmm coefficient. 3DMM provide the 3 rotation degrees in the pose parameters. You can refer the original repo or EG3D repo for the details.

Thx for your reply! Btw there is another question that the camera intrinsic also confused me. I found camera intrinsic is defined as fx = fy = 2985.29 and cx = cy = 512 in function cal_camera() in extract_camera.py, but in function fix_intrinsic() in process_camera.py the fx and fy are divided by 700, while cx = cy = 512 (it seems the cx and cy are divided by 1024). In my view the aligned image's size is 1024*1024 before cropping and resizing (shown in the following screenshot of SPI's source code), and this 1024×1024 image is regarded as captured by the camera (maybe virtual) with intrinsic (fx, fy, cx, cy) = (2985.29, 2985.29, 512, 512). If so, when normalizing the intrinsic to ignore image size during training, the image size should be divided by the size of image. So in SPI's condition, the intrinsic should be divided by 1024. But it seems that the fx and fy are divided by 700 while cx and cy are divided by 1024(and these are saved in target.npy). Could you please explain why are cx and cy divided by 700? image image

LPHFAQ commented 7 months ago

And how is the number 2985.29 gained? I found the formula in extract_camera.py as following figure. But I'm a newbee of 3dgan-inversion and I can't unsderstand this formula... image