Closed pntt3011 closed 1 year ago
Hi, thanks for your interest in our work!!
During the training phase, we learn an "additive" transformation in order to combine the CLIP features meaningfully (for the composed image retrieval task). Since CLIP has been trained on a massive amount of data it "knows" a lot of concepts and therefore the system is able to generalize to other fashion items like pants or accessories (i.e. we mainly learn the transformation relying on CLIP meaningful features).
If you need more info do not hesitate to ask!
@ABaldrati thank you for your reply, I'll close this issue now!
Hi, thank you for your amazing work. I have run your demo and really liked it.
I just have a question. When I play with your demo, I can even query things like pants or accessories and the results are rather good.
I know that the
images
folder of FashionIQ contains these images. But thecaptions
files do not have their relative descriptions (for exampleB0056CW2RG.jpg
)How can their features be learnt during training?