Closed broadwell closed 4 years ago
I like that idea alot. I've also been thinking about an embedding space that encodes text and images. Imagine if one took wikipedia articles and for each tried to predict an imagenet category. Then one could train a single space that could encode text and images...
Yeah that would be cool to do, and most importantly, I think it would look really cool.
I've also obtained good results by generating a merged word embedding model with the repo above (though it involved some manual editing of the .w2v files) and then loading the aligned model via the --model
option.
This would require quite a bit of build-out, but would be really cool. https://github.com/artetxem/vecmap includes a few different ways of purportedly cross-mapping embeddings.