flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.86k stars 2.1k forks source link

How to calculate the nearest neighbor word for contexualized word embedding? #569

Closed jmlongriver closed 5 years ago

jmlongriver commented 5 years ago

I am not sure if there is functionality to evaluate the contextualized word embedding intrinsically, just like classical word embedding, each word could have a nearest neighbor list according to similarity.

Given a test corpus and a sentence within the corpus, can Flair provide an API to calculate the nearest neighbor of the word in the sentence? the output should be also another sentence containing that word.

One example could be found in the slide (https://go.weblife.io/browser?url=https%3A%2F%2Fwww.slideshare.net%2Fshuntaroy%2Fa-review-of-deep-contextualized-word-representations-peters-2018)

Thanks

alanakbik commented 5 years ago

Intrinsic evaluation would be great, but we don't yet have any convenience methods in Flair for this. The "problem" is that words get different embeddings based on context, meaning that we have to compute each embedding for each word in a corpus - that quickly becomes a huge amount of data.

However, in our paper (see Table 4), we did compute nearest neighbors this way for qualitative analysis, which gives interesting results.

jmlongriver commented 5 years ago

Thanks for your answer, I understood that it will create a huge computational burden. I liked the examples you made in the paper.

alanakbik commented 5 years ago

Great, let me know if you have any further questions! I'll close this issue for now but feel free to reopen!