ABaldrati / CLIP4Cir

[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
MIT License
159 stars 16 forks source link

Training on pant/shoe/accessories image? #4

Closed pntt3011 closed 1 year ago

pntt3011 commented 1 year ago

Hi, thank you for your amazing work. I have run your demo and really liked it.

I just have a question. When I play with your demo, I can even query things like pants or accessories and the results are rather good.

I know that the images folder of FashionIQ contains these images. But the captions files do not have their relative descriptions (for example B0056CW2RG.jpg)

How can their features be learnt during training?

ABaldrati commented 1 year ago

Hi, thanks for your interest in our work!!

During the training phase, we learn an "additive" transformation in order to combine the CLIP features meaningfully (for the composed image retrieval task). Since CLIP has been trained on a massive amount of data it "knows" a lot of concepts and therefore the system is able to generalize to other fashion items like pants or accessories (i.e. we mainly learn the transformation relying on CLIP meaningful features).

If you need more info do not hesitate to ask!

pntt3011 commented 1 year ago

@ABaldrati thank you for your reply, I'll close this issue now!