Closed deepfates closed 3 years ago
CLIP encodes images and text to the same latent space. It should be easy to build an input for images, and either search purely on the image or on an image+text vector combination
Added in 678cd5e7c2864fa6fb7c3dff5516227d301f73a4
CLIP encodes images and text to the same latent space. It should be easy to build an input for images, and either search purely on the image or on an image+text vector combination