anosorae / IRRA

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)
MIT License
183 stars 25 forks source link

Is it unfair to use a pre-trained CLIP model compared to some other methods in Table 1? #24

Open maoSuxi opened 8 months ago

maoSuxi commented 8 months ago

sry, a little problem, most baseline methods in table 1 use RN50 or ViT as backbone, i think it's better to report the performance on RN50 or ViT to avoid the extra benefit that pre-trained large models brings?