Hello,
I have some questions regarding RefCOCO/+/g training / evaluation details.
Are you going to upload RefCOCO/+/g training/evaluation codes?
Which boxes did you finetune UNITER on?
Which boxes did you use to evaluate on val, test, val^d, and test^d evaluation respectively? Did you use Mask R-CNN boxes from MattNet?
Table from UNITER
It seems ViLBERT-MT authors finetuned their model on 100 BUTD boxes + Mask R-CNN boxes from MattNet-> code.
Then they used 100 BUTD boxes during evaluation -> code
I calculated oracle scores on RefCOCOg val split: "if there exists a candidate box with iou(candidate,target) > 0.5 => correct"
Mask R-CNN boxes from MAttNet -> 86.10%
MS COCO GT boxes -> 99.6%
VilBERT-MT's 100 BUTD boxes on RefCOCOg -> 96.53%
Since BUTD boxes have better coverage on Mask R-CNN boxes from MAttNet, I don't think this is fair comparison to MattNet. Also this is not consistent with the ViLBERT-MT paper.
Paragraph from ViLBERT-MT
ViLBERT-MT authors compared ViLBERT-MT and UNITER on test^d. I wonder which boxes you used for UNITER finetuning and evaluation.
We finetuned on ground-truth (COCO's) annotated boxes whose features are extracted using butd, and ran inference on
1) ground-truth boxes
2) mattnet's detected boxes
Hello, I have some questions regarding RefCOCO/+/g training / evaluation details.
Table from UNITER
It seems ViLBERT-MT authors finetuned their model on 100 BUTD boxes + Mask R-CNN boxes from MattNet-> code. Then they used 100 BUTD boxes during evaluation -> code
I calculated oracle scores on RefCOCOg val split: "if there exists a candidate box with iou(candidate,target) > 0.5 => correct"
Mask R-CNN boxes from MAttNet -> 86.10% MS COCO GT boxes -> 99.6% VilBERT-MT's 100 BUTD boxes on RefCOCOg -> 96.53%
Since BUTD boxes have better coverage on Mask R-CNN boxes from MAttNet, I don't think this is fair comparison to MattNet. Also this is not consistent with the ViLBERT-MT paper.
Paragraph from ViLBERT-MT
ViLBERT-MT authors compared ViLBERT-MT and UNITER on test^d. I wonder which boxes you used for UNITER finetuning and evaluation.
Table from ViLBERT-MT