clin1223 / VLDet

[ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)
Other
179 stars 11 forks source link

How did you train your Region Proposal Network? #14

Open JiuqingDong opened 1 year ago

JiuqingDong commented 1 year ago

Hi, thank you for your amazing work! I want to know how did you train your Region Proposal Network? In section 1, you said, "We introduce an open-vocabulary object detector method to learn object-language alignments directly from image-text pair data." It sounds like you didn't use any annotation bounding boxes. However, In section 3.1, you said, 'our goal is to build an object detector, trained on a dataset with base-class bounding box annotations and a dataset of image-caption pairs 〈 I, C 〉 associated with a large vocabulary C_open'. It sounds like some bounding boxes are used for supervision.

It confused me a lot. In my opinion, maybe you use the ground truth bounding box of base classes to train the RPN.

Kindly look forward to your reply. Thank you very much.