anosorae / IRRA

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)
MIT License
183 stars 25 forks source link

Are training the CLIP model from scratch? or are you using the pretrained weights? #39

Open Maram-Helmy opened 4 months ago

Maram-Helmy commented 4 months ago

I'm really confuse, in the code you initialize the layers using the normal distribution, but what I understood from the paper is that you are using CLIP model.

Your answer will really help me understand.

Thanks

hoahoa1808 commented 4 months ago

As I read code, the CORE_MODEl can be splited into 2 parts: