aligning meshes with images

treder commented 1 year ago

I want to bring the meshes into image space but the results are still off a bit. As an example I will show images from m--20190828--1318--002645310--GHS/images/EXP_free_face

Now the following transformation are available:

intrinsics matrix K and extrinsics RT in the KRT file
mesh transform from the [...]_transform.txt files in the tracked_mesh folder

The following figure shows the vertices

first row: original vertices (only x and y plotted, z is omitted). The unit is in mm (as far as I understand)
second row: vertices after using the mesh transform in row 2. The mesh transform brings the meshes into an object-centric space and is not relevant so will be omitted for the rest.

Next figure shows the operations on vertices v after applying intrinsics K and extrinsics RT. Note that the vertex coordinates are homogenous (by appending a column of 1's) and the matrices are extended/transposed accordingly

row 1: RT * v. This brings the data into camera space, unit still mm. We can see that the head pose matches the input images.
row 2: K * RT * v (without correction by dividing by z-value)
row 3: K * RT * v then correction with z coordinate is performed (dividing x and y by z). Then the x and y coordinates are shifted by w//2 and h//2. This seems to bring data into image space and align it.

Now we can take the meshes from the last row and superimpose them on the images to see the match. The match is noticeably off. Same for another example further down. So the following questions:

Is the computation correct or am I missing something?
Can you provide the correct formula?
If you have a function that performs [vertices_in_world_space, K, RT] -> [vertices_in_image_space] do you mind sharing it?

Thanks!

vexilligera commented 1 year ago

Hi, for your issue the problem might be in the shifting step. KRT already projects 3D points into 2D image space

you may refer to the code in data loader:

transf = np.genfromtxt(
    "{}/{}/{}_transform.txt".format(self.meshpath, sentnum, frame)
)
R_f = transf[:3, :3]
t_f = transf[:3, 3]
campos = np.dot(R_f.T, self.campos[cam] - t_f).astype(np.float32)
view = campos / np.linalg.norm(campos)

extrin, intrin = self.krt[cam]["extrin"], self.krt[cam]["intrin"]
R_C = extrin[:3, :3]
t_C = extrin[:3, 3]
camrot = np.dot(R_C, R_f).astype(np.float32)
camt = np.dot(R_C, t_f) + t_C
camt = camt.astype(np.float32)

M = intrin @ np.hstack((camrot, camt[None].T))

and then use the matrix M with the denormalized vertex coordinates in homogenous coordinates, e.g.,

projected = M @ vert_homo
projected_2d = projected / projected[:, -1]

This should give you 2d locations in the original image resolution (2048 x 1334) which looks like 295679256_1434121193771921_9044146199318546580_n

treder commented 1 year ago

Thanks for the quick reply. You are right, there must have been some error with the shift, your code works. I made a minimal reproducible example (for future reference) using your dataloader

import numpy as np
import matplotlib.pyplot as plt
from dataset import Dataset 

# instantiate dataset and get sample
data_dir = 'PATH/TO/m--20190828--1318--002645310--GHS/'
multiface = Dataset(data_dir, krt_dir = data_dir + 'KRT', framelistpath = data_dir + 'frame_list_selected.txt')
sample = multiface[11]

where frame_list_selected.txt is my selection of frames and the sampled frame is 003251.

# get verts and denormalize
M = sample['M']
verts = sample['aligned_verts']

verts *= multiface.vertstd
verts += multiface.vertmean.reshape((-1, 3))

# homogenous coordinates and project
vert_homo = np.concatenate((verts, np.ones((verts.shape[0], 1))), axis=1)

projected = vert_homo @ M.T
projected_2d = projected / projected[:, -1:]

# plot
fig=plt.figure(figsize=(20,12), dpi= 100, facecolor='w', edgecolor='k')

plt.imshow(sample['photo'])
plt.plot(projected_2d[:,0], projected_2d[:,1], 'r.', markersize=1)

facebookresearch / multiface

aligning meshes with images #33