Closed kevinmartinjos closed 4 years ago
Right now when creating an embedding matrix random embeddings are used to OOV words. zerounk would set oov words as zero. Both are extremes.
zerounk
Consequences:
Solution : For OOV words,
stoi
Here's a SimilarityMatrix that handles this.
The idea is that OOV terms get negative indices, padding=0, and in-vocab terms get positive indices.
Right now when creating an embedding matrix random embeddings are used to OOV words.
zerounk
would set oov words as zero. Both are extremes.Consequences:
Solution : For OOV words,
zerounk
, which always set it as 0)stoi
- the pymagnitude embedding already has a vocabulary. Simply add OOV terms to this vocab