Jingkang50 / OpenPSG

Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
https://psgdataset.org
MIT License
407 stars 68 forks source link

How are the refine_bboxes and the objects mapped? #71

Closed soham-joshi closed 1 year ago

soham-joshi commented 1 year ago

While inferencing, given an image, the model returns a dictionary having a key 'refine_bboxes', but how to identify which bounding box in the results['refine_bboxes'] maps to which object?

Also, a single box is identified by 5 coordinates, which all coordinates are included in this 5-dimensional array?

soham-joshi commented 1 year ago

@Jingkang50 @c-liangyu @yizhe-ang @wangae could you help me with the above query, please?

GSeanCDAT commented 1 year ago

Hi Joshi, Thanks for your interest in our work. In fact, a single bounding box can be identified either by cxcywh(center coordinates and the size of the box) or by xyxy(a pair of diagonal vertices of the box). Therefore, only four values are needed. In our work, we take the latter fashion. To match the ground truth boxes, we calculate the IoU between the predicted and the ground truths and identify the most matched ground truth as the matched object.

soham-joshi commented 1 year ago

Thank you for your response @GSeanCDAT
The result.refine_boxes gives the bounding boxes of the objects in the scene. Simultaneously, the show_results( ) also outputs the edges in the scene graph. I wanted to create a positional embedding of the subject and the object (concatenated) [subject | object] for every edge in the scene graph. I understand the results.refine_boxes gives the bounding boxes of all the things (subjects and objects) in the scene. I had a query while creating the positional embedding of the edge, how do we get the positional embeddings of the subject and object from the results.refine_boxes? Thanks in advance!

soham-joshi commented 1 year ago

@GSeanCDAT could you help me with the above query, please?

GSeanCDAT commented 1 year ago

Hi Joshi, From the results.refine_boxes, we can only get the coordinates of our predicted subject and object bounding boxes. If you are looking for the feature of subjects and objects predicted by the model, you should be able to find it in the model's forward function and you may need to add extra outputs for the forward function by yourself. In the forward pass, the subjects' and objects' features are available in most models except for the PSGTR where we model the triplet together.
You may also refer to this #58.

soham-joshi commented 1 year ago

Thank you @GSeanCDAT and @Jingkang50 !