Open Lucky-Light-Sun opened 4 months ago
The retrieval code for XCLIP is held by my previous company, but I have been away for a long time, making it difficult to access these codes. Additionally, the past code was based on MMCV1.0 and is incompatible with the current version. However, replicating it is simple. We did not utilize XCLIP's prompting and MIT modules, while only using the CCT module that inserts message tokens into the backbone. We only need to make slight modifications to the VIT block of CLIP, see the CrossFramelAttentionBlock here.
Hi, I notice Intermediate structure
XCLIP
is used for RECOGNITION task and the official code is not used for retrieval task. So I want to ask how do you get the X-CLIP retrieval@1 metric? If you do the experiment by yourself, can you please give me the code? Or please give the refering paper and code.Looking forward to your reply.
Best wishes!