As shown in Figure 1 (b), if representations of any two image-text pairs, (I_dog, T_dog) and (I_cat, T_cat) exactly satisfy both forms of cyclic consistency, when we can guarantee that any test image I_test respects the ordering of distances in both image and text spaces (i.e., if d(I_test, I_dog) > d(I_test, I_cat), then d(I_test, T_dog) > d(I_test, T_cat))
Could you please provide a proof or explanation? Thanks!
Thanks for your great work and make it public!
Could you please provide a proof or explanation? Thanks!