Yuqifan1117 / CaCao

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)
40 stars 5 forks source link

The question for the number of predicates based on Cacao #17

Closed Yassin-fan closed 8 months ago

Yassin-fan commented 8 months ago

Hello, I downloaded the enhanced VG-50 dataset provided by you and compared it with the original VG-50.h5 dataset.

After comparing, I found that only 25 predicates showed an increase in the number and the other 25 predicates even showed a decrease.

In addition, the number of each predicate is also inconsistent with that in Tab6 and Tab7 in the paper.

My method of comparison is to count the number of occurrences of each element in the column 'predicates' in the .h5 file.

May I ask, did you do additional processing and filtering after enhancing the predicates?

Thanks!

Yuqifan1117 commented 8 months ago

Thanks for your attention. To solve the imbalance of long-tail distribution and ensure the quality of generated data, we further (1) filtered out the triples with non-overlapped bounding boxes and (2) mapped the enhanced coarse-grained predicate to the fine-grained target predicate (25 categories), resulting in the enhanced VG-50 dataset. The appendix demonstrates CaCao's visual data generation trends, but this is not the dataset used for final training.

Yassin-fan commented 8 months ago

Thank you for your reply, so may I ask if the choice of these 25 categories is based on experience? Or are there certain criteria?

For I realized that the enhanced predicate categories are not the 25 categories with the lowest sample size in the original dataset.

Yuqifan1117 commented 8 months ago

The selection of these 25 classes is based on the relationship dependence in [1], but is a bit different due to the sample size. We choose those relatively few fine-grained predicates, which are expressive of visual semantics instead of simple relationships.

[1] Vincent S Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, and Li Fei-Fei. Scene graph prediction with limited labels.