Open sharifza opened 4 years ago
Now I understand that your reported numbers are in fact not comparable to Neural Motifs. I consider this some sort of [unintended?] mistake in reporting the results.
In NM (and most of the previous works), SGCls is defined as a setting where bounding boxes are given, while edges are not, and we evaluate the quality of "detected" and "classified" edges. In your work, you have updated the definition of SGCls to a setting where bounding boxes and edges are given and the goal is to evaluate the quality of "classifying" edges. While I understand your motivation behind this change (given the name "Scene Graph Classification"), putting these under the same title in the table, will totally mislead the community.
@sharifza if you could share the code fixing the evaluation of the models in this repo, it would be great! I still see they rank triplets here https://github.com/NVIDIA/ContrastiveLosses4VRD/blob/master/lib/datasets_rel/task_evaluation_vg_and_vrd.py#L84, so I'm not sure where exactly their evaluation goes wrong.
@bknyaz I avoided using this repository for my research. No one responded to my complaint for a year. The mentioned evaluation issue affects the heart of this paper's contribution and questions the validity of everything. There are other repositories that I recommend you to take a look at: Neural Motifs [PyTorch 0.3], Depth-VRD (Neural Motifs [PyTorch > 1.0]), and the recent benchmark by @kaihuatang. Kaihua also pointed out this issue here. (Two Common Misunderstandings in SGG Metrics).
The main problem is that the evaluation for VRD and VG is done in the same file even if the metrics are slightly different. The metrics used in VRD are the following:
The metrics used in VG are:
In PredDet, the pairs (subject, object) are given as pointed in this issue, whereas in PredCls and SGCls are not. This is the problem related to this implementation.
Hope this helps! 👍
I have a question. I don't understand why (in Visual Genome) SGDet gains such a small improvement compared to Neural Motifs whereas SGCls has gains such a larger improvement? Isn't the only difference in the region proposal network?