Open anuragranj opened 2 years ago
Yes, I also want to know. Have you get how to generate AFHQ camera poses or there is any script for that ? Thanks.
Same question
Based on their paper, the authors refer to this repo for 2D landmark detection of cats, and then refer to OpenCV's Perspective-n-Point algorithm implementation for estimating camera poses to transform 3D points to their 2D correspondence.
Thus, it seems that the ground-truth 3D landmarks are missing?
Okay, for those who might be interested in, I found 2 possible solutions.
device = 'cuda'
import torch import numpy as np from torchvision import transforms
normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] )
transformations = transforms.Compose([ transforms.RandomResizedCrop(size=224, scale=(0.8, 1)), transforms.ToTensor(), normalize ])
from model import SixDRepNet model = SixDRepNet(backbone_name='RepVGG-B1g2', backbone_file='', deploy=True, pretrained=False) model.requiresgrad(False) model.load_state_dict(torch.load('./checkpoint.pth')) model.to(device).eval()
@torch.no_grad() def convert_predicted_cam(cam: torch.Tensor): Rot = torch.eye(3, device=cam.device) Rot[0, 0] = 1 Rot[1, 1] = -1 Rot[2, 2] = -1
cam = cam.permute(0, 2, 1)
return cam @ Rot[None, ...]
def gen_pose(rot_mat): rot_mat = np.array(rot_mat).copy() forward = rot_mat[:, 2] translation = forward * -2.7 pose = np.array([ [rot_mat[0, 0], rot_mat[0, 1], rot_mat[0, 2], translation[0]], [rot_mat[1, 0], rot_mat[1, 1], rot_mat[1, 2], translation[1]], [rot_mat[2, 0], rot_mat[2, 1], rot_mat[2, 2], translation[2]], [0, 0, 0, 1], ]) return pose
def gen_label(out: torch.Tensor): pose = gen_pose(convert_predicted_cam(out).squeeze().cpu().numpy()) intrinsics = np.array([ [4.2647, 0.00000000e+00, 0.5], [0.00000000e+00, 4.2647, 0.5], [0.00000000e+00, 0.00000000e+00, 1.00000000e+00] ]) return np.concatenate([pose.reshape(-1), intrinsics.reshape(-1)]).tolist()
from PIL import Image
image = Image.open('/path/to/cat_face.png').convert('RGB') image = transformations(image).to(device) raw = model(image[None, ...]) cam = gen_label(raw) print('Label:', cam)
Based on their paper, the authors refer to this repo for 2D landmark detection of cats, and then refer to OpenCV's Perspective-n-Point algorithm implementation for estimating camera poses to transform 3D points to their 2D correspondence.
Thus, it seems that the ground-truth 3D landmarks are missing?
Seems the paper made a mistake on citation:
Camera poses were extracted via landmark detection [20] and an open-source Perspective-n-Point algorithm [3]. We augment the dataset with horizontal flips.
it should be Camera poses were extracted via landmark detection [32]
@RaymondJiangkw thanks for the 6DRepNet checkpoint, do you happen to know how to crop in-the-wild cat images like eg3d does for in-the-wild celeb images?
@RaymondJiangkw thanks for the 6DRepNet checkpoint, do you happen to know how to crop in-the-wild cat images like eg3d does for in-the-wild celeb images?
I have tried some approaches. But none of them give reasonable and consistent results for really diverse cat images... The strategy mentioned in their paper, i.e., using landmarks detection and pnp to solve the camera parameters, may be an option for you.
Very nice work. How did you generate the camera poses for AFHQ dataset?