Questions on Evaluations

C-H-Chien commented 2 weeks ago

I have a few questions on your evaluations:

For the ABC_NEF dataset, the GT parametric curves have to be shifted and scaled so that they are located at the correct coordinate (see this issue). And since this dataset is a synthetic dataset, the 3D points are unit-less. How do you relate the threshold $\tau$ in millimeters mentioned in your paper to the threshold of the unit-less 3D GT? (In the NEF paper, they set the threshold 0.02, with no unit, as the L2 distance from the reconstructed edge point to the ground truth point.)
For generating the 3D GT edges from the DTU dataset, the supp says that the dense point cloud is projected to the images and then cross-comparing these projections with 2D edges observations to mark 3D edges. Are the "cross-comparing" done manually? If not, what strategies/algorithms did you use? In addition, the supp says, (Section B.3) "To ensure accuracy in the ground-truth edge points, we manually set thresholds for each scan and meticulously remove any floating points." What threshold and floating points are you referring to here?
For the metrics, what are the definitions of "Accuracy" and "Completeness"? The paper says that it follows the NEF and the LiMAP papers for the metrics but I do not find them from both papers. Maybe it's a typo as I find the definitions in the NEAT paper, Section F of supp. Also, do you have an explicit definition of what an "edge direction consistency" is?
In Table 1, are the reported numbers the average of all selected 82 CAD models? Do you use all 50 images per CAD model?

rayeeli commented 2 weeks ago

Thanks for your good questions!

As mentioned in that issue, they scale and shift the real CAD models before rendering images. This means the images in the ABC_NEF dataset are rendered using unit CAD models, scaled within the range of [0, 1m]. Therefore, our metrics in the ABC-NEF dataset with millimeters are physically meaningful.
For "cross-comparing," we only retain edge points that are visible in most views based on a visibility ratio. This ratio is adjusted for each scan to match the properties of the views. After this adjustment, there can still be some noisy edge points, such as floating points, which we manually remove to ensure the edge points are clean.
Accuracy is the mean distance of predicted edge points to ground truth (GT) edge points. Completeness is the mean distance of GT edge points to predicted edge points. They are intermediate metrics of precision and recall. Edge direction consistency is computed using the cosine similarity between the edge direction of predicted points and GT edge points.
Yes, the reported numbers are averaged over 82 CAD models, and all methods use 50 images from the ABC-NEF dataset per CAD model.

C-H-Chien commented 2 weeks ago

Thank you for the detailed responses! For the second question, does the provided dataset include the GT 3D edges?

rayeeli commented 2 weeks ago

We have updated the dataset and evaluation code. You can check them.

C-H-Chien commented 2 weeks ago

Hi! Thanks for the update. I have another few questions and I hope I am not acting silly :)

Where did you get the unit from the ABC_NEF dataset? Could you kindly guide me on where you see the "the range of [0, 1m]" is? I just know that it is within a [0,1] bounding box with no unit. Thank you!
When displaying the ground truth 3D edge from the DTU dataset through meshlab, I found that the 3D edges are not crisp and clean, e.g., in scan 105,

snapshot00

One hypothesis is that the 2D edges you used is not crisp. This can be observed from the 2D edge maps of PiDiNet and DexiNed. Suppose that PiDiNet and DexiNet are accurate, a question is then how do you know the obtained GT 3D edges are true 3D edges? If I create a crisp 3D edge reconstruction, I expect that my method would have a very low recall because of non-crisp 3D GT edges.

rayeeli commented 1 week ago

Hey, for your first question, the CAD models are scaled by dividing each axis by the length of the largest side of the bounding box, resulting in a range within [0, 1]. Additionally, the camera poses during rendering are set in meters by default. Thus, if we generate a 3D CAD model based on the given camera poses and rendered images, it should be measured in meters.

For your second question, it is difficult to obtain perfectly clean edge points. We obtained these edge points by projecting the ground-truth dense point clouds onto all 2D image planes and checking if they are located in the edge areas. Due to the blurred ground truth edges, we report recall and precision within a given threshold to reduce the effect caused by the blurred ground truth edge points, which is the issue you mentioned.

C-H-Chien commented 1 week ago

Hi! Thanks for your kind response :) For your first answer, it would be great if you could provide some references as I do not find the unit in either ABC_NEF paper or ABC dataset paper (could be my mistake). Nevertheless, I appreciate your explanations.

For the second answer, the recall needs false negatives, i.e., 3D GT edges that do not covered by the reconstructed edges. How does false negatives relate to the threshold? Assume that there are no reconstructed 3D edges on the cheek of the bear from the DTU dataset scan 105, but there are a lot of 3D GT edges on the cheek, do you count all the 3D GT edges on the cheek as false negatives?

I apologize for asking so many questions. Thanks for your considerations.

cvg / EMAP

Questions on Evaluations #3