MindEye can map to any multimodal latent space like CLIP, OpenCLIP, AlexNet, ImageBind, etc. Can you improve our retrieval and/or reconstruction evaluation metrics by either (1) mapping to a different embedding space or (2) combining multiple models mapped to different embedding spaces to improve final metrics?
MindEye can map to any multimodal latent space like CLIP, OpenCLIP, AlexNet, ImageBind, etc. Can you improve our retrieval and/or reconstruction evaluation metrics by either (1) mapping to a different embedding space or (2) combining multiple models mapped to different embedding spaces to improve final metrics?