Does VLDet support image and text retrieval? For example, my purpose is to give a text to retrieve the most matching image. If the model supports it, should I use the image embedding? Or each instance embedding? As far as I understand, should I use
proj_x = self.linear(input_x) [VLDet/vldet/modeling/roi_heads/zero_shot_classifier.py line98] as the image/instances embedding?
Does VLDet support image and text retrieval? For example, my purpose is to give a text to retrieve the most matching image. If the model supports it, should I use the image embedding? Or each instance embedding? As far as I understand, should I use
proj_x = self.linear(input_x) [VLDet/vldet/modeling/roi_heads/zero_shot_classifier.py line98] as the image/instances embedding?