Closed bluestyle97 closed 4 years ago
Ok so let's explain how we compute AP in object recognition. The evaluation code is indeed simplified for Pix3D, but is actually really close to the general case.
Let's say you have N
ground truth objects and M
detections in an image. When you want to report a metric (any metric, e.g. mesh AP), then you compare the N gt meshes with the M prediction meshes. This will give you a NxM
array of the similiarity you want to compute - for masks this is IOU, for meshes it's F1.
To compute the AP, you sort the predictions in descending score (this is the classifier score for object recognition). A detection is mapped to the ground truth with the largest similarity (so there is a mapping between ground truth and prediction!). If this ground truth has not been mapped with any other (higher scoring obviously) prediction and the similarity is above a threshold (in our case 0.5 for both mask IOU and mesh F1), then the prediction is a true positive. Otherwise, it's a false positive.
Now that you have mapped all predictions with a tp/fp label, you are ready to compute AP.
Thanks a lot for your nice answer! I have figured out most of the details with your help but still have one question. Since different metrics may lead to different mappings from predictions to ground truth, e.g., prediction M1
have higher mask IoU with gt object N1
, but have higher mesh F1 score with gt object N2
, thus the mapping will be different when computing mask IoU and mesh F1 score. Is this right?
In other words, if we see mask prediction, box prediction and mesh prediction as different sub-tasks, are these sub-tasks irrelevant to each other when computing their metrics (AP)? Don't they need a unified mapping from the predictions to the ground truth?
Correct! Different metrics lead to different mappings between ground truth and predictions.
Thank you very much! My problem has been solved. I will close this issue.
Hi @bluestyle97.
I am also trying to train mesh rcnn on my custom dataset. Could you please help me how did you train on your custom datasets? I have 2d image, its segmented mask and corresponding 3d obj file. Which format of data should I use for training so that I can run training on my dataset?
Also how did you solve your problem in case of evaluation on multiple objects. If you can share your idea it would be helpful.
Thank you.
Amrit
Hi, I'm training meshrcnn on a custom 3D dataset. Different from ShapeNet or Pix3D, there are multiple objects in each image instead of only one object. However, I found that in the pix3d_evaluation.py, you assumed that there is only one ground truth instance annotation for each input image (https://github.com/facebookresearch/meshrcnn/blob/master/meshrcnn/evaluation/pix3d_evaluation.py#L259). This makes the evaluation code non-applicable for multi-objects annotations. To solve this, I have use a for-loop to gather the ground truth boxes, masks, and meshes as a list respectively, then I can compute the box IoU and mask IoU normally. However, I'm confused in how to compute the mesh metrics, since the
compare_meshes
(in meshrcnn/utils/metrics.py) function assume that each predicted mesh has been matched with its ground truth mesh (https://github.com/facebookresearch/meshrcnn/blob/master/meshrcnn/evaluation/pix3d_evaluation.py#L341). In the multi-objects case, the network may output N predicted meshes, but there are M ground truth meshes for this image. So could you please give me some advice on how to match the predicted meshes and the ground truth meshes, and how to modify the code to make the evaluation feasible?