Closed jbingel closed 6 years ago
check OOV rate for targets. if high, perhaps use BPE embeddings. Text to BPE in Python like this: https://stackoverflow.com/questions/8870261/how-to-split-text-without-spaces-into-list-of-words
implemented this in https://github.com/bjerva/cwi18/commit/fcdb090bdfefe4921f9c16327c4f3e15e4bdb4ca, but takes forever, perhaps gotta compute offline and load from textfile
On first data inspection, one intuition was that the similarity between a target word/phrase and the rest of its containing sentence could be somewhat predictive, essentially modeling surprisal of the word in context.