Open AmitMY opened 4 years ago
Let K be the all the words on the board. Let d be the embedding dimension. Let V be the vocabulary size.
we multiply the K d matrix with the ((V - K) d).transpose this yields a K * (V-K) matrix, where each row is the similarity of a word to all words in the vocabulary.
We generate a binary matrix of 2^K by K, which represents all possible combinations of words. When we take this binary matrix and multiply it by the larger matrix we get the similarity of the average word vector representation to all other words in the vocabulary.
We compensate combinations with more words, for example by multiplying them by the number of words in the combination.
Then we take the highest word similarity, corresponds to an element in the large matrix. If that word is more similar to any of the "bad" words than to any of the words in our combination we go to the next word, otherwise, we pick it.
This is an iterative process rather (O(KV)) than a mathematical one (O(KK)) -
Then we take the highest word similarity, corresponds to an element in the large matrix. If that word is more similar to any of the "bad" words than to any of the words in our combination we go to the next word, otherwise, we pick it.
It could be done mathematically if need be. If this part is slow, we'll formulate a mathematical way.
I recommend using Jax ( https://github.com/google/jax ) as import jax.numpy as np
because its numpy and runs on GPU (so really 0 time compute)
Nice, will check that. I wonder what's the time difference between running this on CPU and GPU. could it be that the memory transfer to GPU would take a lot of time, as we need to transfer some memory in each turn.