chenhsuanlin / photometric-mesh-optim

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction :globe_with_meridians: (CVPR 2019)
MIT License
208 stars 25 forks source link

Quantitative evaluation #7

Closed pengsongyou closed 4 years ago

pengsongyou commented 4 years ago

HI Chen-Hsuan,

First of all, thanks so much for this amazing work and well-structured codes!

I am currently trying to reproduce your results in Table 1 with your provided models on plane, car and chair, but it seemed that you have not provided the evaluation code? Instead, you provided the evaluation in the pretraining AtlasNet part. Wondering if I can simply adapt your evaluation code in pretraining AtlasNet to your PMO code? If not, could you please also kindly provide with your evaluation code?

Many thanks in advance!

chenhsuanlin commented 4 years ago

Sorry for the late reply. Yes, the evaluation for PMO is the same as for the pretraining part. If you want to follow the metrics I reported in the paper (separate for GT->pred and pred->GT), you can simply report dist1 and dist2 respectively from the chamfer distance function. Many other papers, however, seem to combine the two together (as in the training loss) and report a single error metric, though I personally don't think it makes much sense to do so. Hope this helps!

pengsongyou commented 4 years ago

Hi Chen-Hsuan,

Thanks for your reply. May I ask one quick question? If I directly use your main.py to evaluate, it can only run one instance each time because the batch size is 1. Let me take the chair class as example, there are 1356 instances, and your method needs ~3 mins for each instance, so in total the evaluation time will be ~67 hours, which is way too long. Therefore, I am wondering if you still can provide your evaluation which I could simply run the whole thing in batch for the fair comparison? Thanks so much for the help!

pengsongyou commented 4 years ago

Sorry for the late reply. Yes, the evaluation for PMO is the same as for the pretraining part. If you want to follow the metrics I reported in the paper (separate for GT->pred and pred->GT), you can simply report dist1 and dist2 respectively from the chamfer distance function. Many other papers, however, seem to combine the two together (as in the training loss) and report a single error metric, though I personally don't think it makes much sense to do so. Hope this helps!

May I ask also how did you acquire the GT and pred in your case? If the GT points are sampled from ply provided in AtlasNet dataset, your var.vertices apparently not in the same scale to the GT, so the chamfer distance is actually quite large. When you did the quantitative experiments, did you use var.vertices, var.vertices_world or var.vertices_canon as pred for evaluation? Many thanks in advance!!

chenhsuanlin commented 4 years ago

Yes, evaluating the entire test set can take up to a couple of days.... it was the same case for me. You could try increasing opt.batch_size, but for my case, batch size 1 already utilizes the GPU power almost to its fullest. The ground truth is taken from the .ply files from AtlasNet. To compare against the ground truth, you should evaluate in the canonical coordinates after applying the refined 3D similarity, i.e. applying var.vertices with the inverse of cs_map_mtrx.

chenhsuanlin commented 4 years ago

Closing for now, please feel free to reopen if there are further issues.

martinruenz commented 4 years ago

Hi there! I have a question along the same lines. Given a groundtruth pointcloud from AtlasNet <id>.points.ply.npy and the camera parameters from camera.npz. How do I correctly project the pointcloud to a frame from sequences/<class>/<id>.npy? It seems that a normalization and rotation has to be applied to the pointcloud, but I end up with a small misaligning with the following strategy: uv = K * T_cw * R_z * R_y * normalize(points). Where R_z and R_y and 90 degree rotations and T_cw the world to camera transformation. I tried a couple of normalization approaches but non works exactly. The closest so far is subtracting by the mean and then dividing by the max norm of the points. Could you give me a hint about the correct normalization or am I missing something else?

chenhsuanlin commented 4 years ago

If I remember correctly, the ground-truth point clouds from AtlasNet were already zero-centered and scaled normalized to tightly fit a unit sphere, so additional normalization shouldn't be necessary. You could check if you're applying the transformations consistently with these lines of code (before the camera extrinsic/intrinsic matrices): https://github.com/chenhsuanlin/photometric-mesh-optim/blob/master/data.py#L29 https://github.com/chenhsuanlin/photometric-mesh-optim/blob/master/model.py#L60

Hope this helps! Please let me know if you need further clarifications.

martinruenz commented 4 years ago

Hmm.. maybe I am using the wrong groundtruth points, I got the clouds using this downloader.

Here is a self-contained script showing what I am doing, the normalization should be the same as this one:

import numpy as np
import matplotlib.pyplot as plt

path_camera = "path/camera.npz"
path_frames = "path/cbc5e6fce716e48ea28e529ba1f4836e.npy"
path_points = "path/cbc5e6fce716e48ea28e529ba1f4836e.points.ply.npy"
frame_index = 30
rotx = np.array([[1, 0, 0], [0, 0, -1], [0, 1, 0]], dtype=np.float64)
rotz = np.array([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], dtype=np.float64)

def center_normalize_unitball_pmo(points):
    points = points - points.mean(axis=0, keepdims=True)
    return points / np.linalg.norm(points, axis=1).max()

def print_info(name, points):
    print(f'{name}, mean: {points.mean(axis=0)}, '
          f'max-dist: {np.linalg.norm(points, axis=1).max()}, '
          f'min-dist: {np.linalg.norm(points, axis=1).min()}')

# Load input
camera_data = np.load(path_camera)
K = camera_data['intr']
transforms_cw = camera_data['extr'][frame_index]
rgb = np.load(path_frames)[frame_index, :, :, :3]
points_3d = np.load(path_points)
print_info('input points', points_3d)

# Normalize
points_3d = center_normalize_unitball_pmo(points_3d)
print_info('normalized points', points_3d)

# To camera
points_3d = (rotz.T @ rotx@ points_3d.T).T
points_3d = transforms_cw @ np.concatenate([points_3d, np.ones([points_3d.shape[0], 1], dtype=np.float64)], axis=-1).T

# Project to 2D
points_2d = K @ points_3d
points_2d = points_2d[:2, ...] / points_2d[2, ...]

# Plot
plt.imshow(rgb)
plt.scatter(points_2d[0, ...], points_2d[1, ...], s=2, alpha=0.2, color='red')
plt.show()

The output is:

input points, mean: [-0.01417218 -0.07363589  0.02564108], max-dist: 0.4992210885820029, min-dist: 0.038848124935445726
normalized points, mean: [ 6.94359385e-16 -2.94482159e-15  7.52969909e-16], max-dist: 1.0, min-dist: 0.012235853184741012

Showing that the input clouds is not normalized yet. This is the resulting plot: image

Any idea what's going wrong?

chenhsuanlin commented 4 years ago

That reminds me -- there were metadata files (*.points.ply2.txt) in the AtlasNet dataset accompanying the point clouds, which included scale and translation data. I had to use them to normalize the CAD models in Blender before rendering them. Hopefully that takes care of your problem!

martinruenz commented 4 years ago

I see. These files don't seem to be part of the downloader. Do you remember where you found them?

chenhsuanlin commented 4 years ago

I downloaded it here (they seem to have redone the dataset download pipeline).

martinruenz commented 4 years ago

Hey, I had another go at this today but still had no luck with the alignment. Could you check which exact normalization is performed?

chenhsuanlin commented 4 years ago

Sorry for the late reply. I was using the provided normalization as follows: (suppose tx ty tz, and s are read from the file)

trans = (-tx/s,tz/s,-ty/s) # this involves some weird 90-degree rotation for ShapeNet v2 objects
scale = (1/s,1/s,1/s)

then in Blender (with the object selected),

bpy.ops.transform.translate(value=trans)
bpy.ops.transform.translate(value=scale)

This was the exact normalization steps I used to create the synthetic sequences. Hope this helps!

chenhsuanlin commented 4 years ago

Hi @martinruenz, I played with it a bit more and it seems that there were some funny operations in my Blender script that complicated things. I double checked and the below should work for you to get well-aligned GT point clouds to the rendered images: (again suppose tx ty tz and s are read from the file)

When the mesh is parsed from a ShapeNet .obj file, each vertex (x,y,z) should be normalized by

x' = (x-tx)/s
y' = (x-ty)/s
z' = (x-tz)/s

At this point, you should get a normalized mesh that tightly fits a zero-centered unit sphere, i.e. the max norm of all vertices should be 1 (perhaps differing by a small epsilon).

The extrinsic matrices were generated from Blender and it seemed that there was a funny 90-degree rotation upon reading a ShapeNet .obj file 🤷‍♂️. To use the extrinsic matrices, you should first rotate the point clouds with (x",y",z") = (x',-z',y'), then after applying the extrinsics and intrinsics you should get the projected points that aligns with the rendered images.

Apologies again about the confusion! The script was too outdated and I was kind of confused myself. Hopefully this works well for you.

martinruenz commented 4 years ago

Thanks, I got it. This works for ShapeNetv2 models.