facebookresearch / meshrcnn

code for Mesh R-CNN, ICCV 2019
Other
1.14k stars 173 forks source link

How to evaluate on a multi-objects dataset? #73

Closed bluestyle97 closed 4 years ago

bluestyle97 commented 4 years ago

Hi, I'm training meshrcnn on a custom 3D dataset. Different from ShapeNet or Pix3D, there are multiple objects in each image instead of only one object. However, I found that in the pix3d_evaluation.py, you assumed that there is only one ground truth instance annotation for each input image (https://github.com/facebookresearch/meshrcnn/blob/master/meshrcnn/evaluation/pix3d_evaluation.py#L259). This makes the evaluation code non-applicable for multi-objects annotations. To solve this, I have use a for-loop to gather the ground truth boxes, masks, and meshes as a list respectively, then I can compute the box IoU and mask IoU normally. However, I'm confused in how to compute the mesh metrics, since the compare_meshes (in meshrcnn/utils/metrics.py) function assume that each predicted mesh has been matched with its ground truth mesh (https://github.com/facebookresearch/meshrcnn/blob/master/meshrcnn/evaluation/pix3d_evaluation.py#L341). In the multi-objects case, the network may output N predicted meshes, but there are M ground truth meshes for this image. So could you please give me some advice on how to match the predicted meshes and the ground truth meshes, and how to modify the code to make the evaluation feasible?

gkioxari commented 4 years ago

Ok so let's explain how we compute AP in object recognition. The evaluation code is indeed simplified for Pix3D, but is actually really close to the general case.

Let's say you have N ground truth objects and M detections in an image. When you want to report a metric (any metric, e.g. mesh AP), then you compare the N gt meshes with the M prediction meshes. This will give you a NxM array of the similiarity you want to compute - for masks this is IOU, for meshes it's F1.

To compute the AP, you sort the predictions in descending score (this is the classifier score for object recognition). A detection is mapped to the ground truth with the largest similarity (so there is a mapping between ground truth and prediction!). If this ground truth has not been mapped with any other (higher scoring obviously) prediction and the similarity is above a threshold (in our case 0.5 for both mask IOU and mesh F1), then the prediction is a true positive. Otherwise, it's a false positive.

Now that you have mapped all predictions with a tp/fp label, you are ready to compute AP.

bluestyle97 commented 4 years ago

Thanks a lot for your nice answer! I have figured out most of the details with your help but still have one question. Since different metrics may lead to different mappings from predictions to ground truth, e.g., prediction M1 have higher mask IoU with gt object N1, but have higher mesh F1 score with gt object N2, thus the mapping will be different when computing mask IoU and mesh F1 score. Is this right? In other words, if we see mask prediction, box prediction and mesh prediction as different sub-tasks, are these sub-tasks irrelevant to each other when computing their metrics (AP)? Don't they need a unified mapping from the predictions to the ground truth?

gkioxari commented 4 years ago

Correct! Different metrics lead to different mappings between ground truth and predictions.

bluestyle97 commented 4 years ago

Thank you very much! My problem has been solved. I will close this issue.

kaphleamrit2 commented 2 years ago

Hi @bluestyle97.

I am also trying to train mesh rcnn on my custom dataset. Could you please help me how did you train on your custom datasets? I have 2d image, its segmented mask and corresponding 3d obj file. Which format of data should I use for training so that I can run training on my dataset?

Also how did you solve your problem in case of evaluation on multiple objects. If you can share your idea it would be helpful.

Thank you.

Amrit