Closed jmlongriver closed 5 years ago
Intrinsic evaluation would be great, but we don't yet have any convenience methods in Flair for this. The "problem" is that words get different embeddings based on context, meaning that we have to compute each embedding for each word in a corpus - that quickly becomes a huge amount of data.
However, in our paper (see Table 4), we did compute nearest neighbors this way for qualitative analysis, which gives interesting results.
Thanks for your answer, I understood that it will create a huge computational burden. I liked the examples you made in the paper.
Great, let me know if you have any further questions! I'll close this issue for now but feel free to reopen!
I am not sure if there is functionality to evaluate the contextualized word embedding intrinsically, just like classical word embedding, each word could have a nearest neighbor list according to similarity.
Given a test corpus and a sentence within the corpus, can Flair provide an API to calculate the nearest neighbor of the word in the sentence? the output should be also another sentence containing that word.
One example could be found in the slide (https://go.weblife.io/browser?url=https%3A%2F%2Fwww.slideshare.net%2Fshuntaroy%2Fa-review-of-deep-contextualized-word-representations-peters-2018)
Thanks