How to generate the scene graph based on the bottom up features ?

Einstone-rose commented 4 years ago

❓ Questions and Help

In bottom up features, every images contains a 2048 dim features and a five tuples features about bounding boxs (x1, y1, x2, y2, w*h). In Auto-Encoding Scene Graphs for Image Captioning paper, it doesn't provide the detail informations about how to generate the scene graph based on the bottom up features instead of raw images? What protocols (PreCls, SGCls or SGDet) do i use to deal this? I am very grateful if you can provide some clues. Thanks.

KaihuaTang commented 4 years ago

We don't need to use bottom-up features, and actually we can't, because our SGG model needs union box features, which is not provided by bottom-up features. Instead, you just need to use the bottom-up bounding boxes and feed it to any pre-trained SGG model, so you can generate a scene graph based on bottom-up boxes. It's SGCls, i.e., given bounding boxes. However, you need to rewrite the dataloader.

Einstone-rose commented 4 years ago

Thanks a lot. Now I have a clear mind for my work.

Einstone-rose commented 4 years ago

Also, i have two small questions. (a) I have tested custom images (MSCOCO 2014 train) on SGDet model, and i got two files: custom_data_info.json and custom_prediction.json. In custom_prediction.json, there are some information listed following: bbox 80 (num) bbox_labels 80 bbox_scores 80 rel_pairs 6320 (num) rel_labels 6320 rel_scores 6320 rel_all_scores 6320 As for bbox_scores, is there an empirical threshold to filter the low confidence bbox (i am not studying on object detection) ? And What's the difference between rel_scores and rel_all_scores? (b) If i want to turn on the ATTRIBUTE_HEAD, when should i turn on it, faster-rcnn detection training or relation training (i.e. SGDet training)? please give some details, thanks a lot.

KaihuaTang commented 4 years ago

yes, the bbox_scores is used to filter the bbox, but not by a threshold. It's more like a top K selection. the difference between rel_score and rel_all_scores is that the first one only output the max score of each prediction, the second one output all the softmax distribution. If you want to use attribute head, you should turn it on the detection training, but attribute head may contain some bugs (I didn't test it)

BAZINGAC commented 3 years ago

Also, i have two small questions. (a) I have tested custom images (MSCOCO 2014 train) on SGDet model, and i got two files: custom_data_info.json and custom_prediction.json. In custom_prediction.json, there are some information listed following: bbox 80 (num) bbox_labels 80 bbox_scores 80 rel_pairs 6320 (num) rel_labels 6320 rel_scores 6320 rel_all_scores 6320 As for bbox_scores, is there an empirical threshold to filter the low confidence bbox (i am not studying on object detection) ? And What's the difference between rel_scores and rel_all_scores? (b) If i want to turn on the ATTRIBUTE_HEAD, when should i turn on it, faster-rcnn detection training or relation training (i.e. SGDet training)? please give some details, thanks a lot.

❓ Questions and Help

In bottom up features, every images contains a 2048 dim features and a five tuples features about bounding boxs (x1, y1, x2, y2, w*h). In Auto-Encoding Scene Graphs for Image Captioning paper, it doesn't provide the detail informations about how to generate the scene graph based on the bottom up features instead of raw images? What protocols (PreCls, SGCls or SGDet) do i use to deal this? I am very grateful if you can provide some clues. Thanks.

Hello, have you realized this function? Can I learn from your work? Thanks a lot!

WUSTxzy commented 2 years ago

Also, i have two small questions. (a) I have tested custom images (MSCOCO 2014 train) on SGDet model, and i got two files: custom_data_info.json and custom_prediction.json. In custom_prediction.json, there are some information listed following: bbox 80 (num) bbox_labels 80 bbox_scores 80 rel_pairs 6320 (num) rel_labels 6320 rel_scores 6320 rel_all_scores 6320 As for bbox_scores, is there an empirical threshold to filter the low confidence bbox (i am not studying on object detection) ? And What's the difference between rel_scores and rel_all_scores? (b) If i want to turn on the ATTRIBUTE_HEAD, when should i turn on it, faster-rcnn detection training or relation training (i.e. SGDet training)? please give some details, thanks a lot.

Hello, I would like to know the total number of object categories in the scene graph you generated with MSCOCO2014 dataset.

184446223 commented 1 year ago

Hello, have you realized this function? Can I learn from your work? Thanks a lot!

Hello, have you realized this function? Can I learn from your work? Thanks a lot!

184446223 commented 1 year ago

Also, i have two small questions. (a) I have tested custom images (MSCOCO 2014 train) on SGDet model, and i got two files: custom_data_info.json and custom_prediction.json. In custom_prediction.json, there are some information listed following: bbox 80 (num) bbox_labels 80 bbox_scores 80 rel_pairs 6320 (num) rel_labels 6320 rel_scores 6320 rel_all_scores 6320 As for bbox_scores, is there an empirical threshold to filter the low confidence bbox (i am not studying on object detection) ? And What's the difference between rel_scores and rel_all_scores? (b) If i want to turn on the ATTRIBUTE_HEAD, when should i turn on it, faster-rcnn detection training or relation training (i.e. SGDet training)? please give some details, thanks a lot.

Hello, have you realized this function? Can I learn from your work? Thanks a lot!

KaihuaTang / Scene-Graph-Benchmark.pytorch

How to generate the scene graph based on the bottom up features ? #74

❓ Questions and Help

❓ Questions and Help