Sara-Ahmed / SiT

Self-supervised vIsion Transformer (SiT)
320 stars 49 forks source link

why 3 tasks/objectives? #29

Closed mywebinfo65536 closed 2 years ago

mywebinfo65536 commented 2 years ago

Hi, I was reading your paper these days, but i don't really understand why you setted 3 tasks/objectives(named image reconstruction,contrastive predition and rotation prediction), what are the purposes ? thanks

Sara-Ahmed commented 2 years ago

Thanks for your interest in our work. The main task is GMML, and if you can refer to "GMML is all you need" paper for more insights. The SimCLR is to have a global representation that enforce notion of similarity/dissimilarity between positive and negative pairs. As for Rotation, it is pretext task to encourage the network to understand the structure of the objects, as if the network correctly classified that the object is upside down, that means it understands how the object should look like. But, the rotation can be incorporated with the simclr task, and actually this what we did in the recent version (repo will be updated soon).