The make_embedding_matrix helper docs say that it should be fed in a list of words in the dataset. However, for the embedding matrix to return the correct embedding of a word from pretrained embeddings, the word list should be fed in the same order as in the vocabulary. Furthermore, there should be no gaps in the word indices in the vocabulary. These are big assumptions.
I think the correct way to construct the embedding matrix is to pass the vocab to the make_embedding_function, and use the token_to_idx method in the vocab to find which rows of the embedding matrix should be populated.
I'm studying 5_3_Document_Classification_with_CNN.
The make_embedding_matrix helper docs say that it should be fed in a list of words in the dataset. However, for the embedding matrix to return the correct embedding of a word from pretrained embeddings, the word list should be fed in the same order as in the vocabulary. Furthermore, there should be no gaps in the word indices in the vocabulary. These are big assumptions.
I think the correct way to construct the embedding matrix is to pass the vocab to the make_embedding_function, and use the token_to_idx method in the vocab to find which rows of the embedding matrix should be populated.
Correct me if I'm wrong.