Closed elie-h closed 5 years ago
Technically, the code looks good. Here are some other comparisons with BERT and ELMo:
LM | Sentence | Similarity |
---|---|---|
BERT (bert-base-uncased) | George Washington addressed his supporters | 0.6652 |
BERT (bert-base-uncased) | Taking a flight to Washington tonight | 0.6186 |
BERT (bert-base-uncased) | Arkansaw is a lovely state | 0.5656 |
BERT (bert-base-uncased) | George Washington was a great president | 0.6955 |
BERT (bert-base-cased) | George Washington addressed his supporters | 0.8641 |
BERT (bert-base-cased) | Taking a flight to Washington tonight | 0.8477 |
BERT (bert-base-cased) | Arkansaw is a lovely state | 0.8385 |
BERT (bert-base-cased) | George Washington was a great president | 0.8622 |
BERT (bert-large-uncased) | George Washington addressed his supporters | 0.7823 |
BERT (bert-large-uncased) | Taking a flight to Washington tonight | 0.7476 |
BERT (bert-large-uncased) | Arkansaw is a lovely state | 0.7185 |
BERT (bert-large-uncased) | George Washington was a great president | 0.8058 |
BERT (bert-large-cased) | George Washington addressed his supporters | 0.8190 |
BERT (bert-large-cased) | Taking a flight to Washington tonight | 0.7761 |
BERT (bert-large-cased) | Arkansaw is a lovely state | 0.7934 |
BERT (bert-large-cased) | George Washington was a great president | 0.8424 |
ELMo | George Washington addressed his supporters | 0.3986 |
ELMo | Taking a flight to Washington tonight | 0.4577 |
ELMo | Arkansaw is a lovely state | 0.3902 |
ELMo | George Washington was a great president | 0.3886 |
GPT-1 | George Washington addressed his supporters | 0.8232 |
GPT-1 | Taking a flight to Washington tonight | 0.8396 |
GPT-1 | Arkansaw is a lovely state | 0.7307 |
GPT-1 | George Washington was a great president | 0.8003 |
Transformer-XL | George Washington addressed his supporters | 0.2481 |
Transformer-XL | Taking a flight to Washington tonight | 0.1841 |
Transformer-XL | Arkansaw is a lovely state | 0.3009 |
Transformer-XL | George Washington was a great president | 0.2997 |
ELMo looks quite similar to the result with Flair Embeddings :)
Spelling Arkansas correctly may help the model realize that it's a geolocation.
Also, given the 4 sentences, Flair correctly ranks the "Taking a flight to Washington tonight " as the most important, so I don't see the problem. Maybe you'd like the difference in similarity to be higher.
I'd like to see how TransformerXL and GPT-2 do on this, and maybe even word2vec / fasttext
@stefan-it - thanks for the table that's quite interesting.
@Hellisotherpeople Arkansaw is a town in Wisconsin. Would expect it to pick up on that as geolocation too.
Sorry just realised I said it's a lovely state in the example - I see how that's misleading.
As requested, I added the scores for GPt-1 and Transformer-XL 🤗
Thanks @stefan-it
Tried some other examples with Flair - these actually work well:
the bucket and mop are in the closet - he kicked the bucket - 0.5848
the bucket and mop are in the closet - i have yet to cross-off all the items on my bucket list - 0.5263
the bucket and mop are in the closet - the bucket was filled with water - 0.6970
he is currently resting at home - the dog sleeps in the kennel - 0.4730
he is currently resting at home - he lived in a beautiful mansion - 0.5347
he is currently resting at home - the home office issued penalties for late filing - 0.4030
he is currently resting at home - press the home button on your phone - 0.3302
Anyone have any further insight or ideas? If not I'll close this out later on
Hello @eliehamouche @stefan-it thanks for sharing these results!
Another idea would be not to use the cosine of document vectors as a measure of similarity, but different measures that get document similarity based on word embeddings. An example of this would be the word mover's distance: Like document pool embeddings, it need not be trained so it can be used without supervision. We don't yet have it in Flair, but I think it's probably not difficult to implement and experiment with. It might be interesting to see how well word mover's distance works with different types of contextualized word embeddings.
Hey @alanakbik - sorry for the delay missed the notification
That looks quite interesting actually, I'll do a quick comparison and revert back.
@alanakbik @eliehamouche Hello, really thank you provide the transformerXL embedding, but I have a question, if i train my own transformerXL embedding, it seems that it cannot be integrated into the embedding in Flair, such as elmo, I just provide the option_file and the weight_file.
@songtaoshi I will push a follow-up PR for passing custom models into the newly added embeddings very soon (I've also trained a few XLNet models) :)
@stefan-it Wow great !!!!! thanks for your replying. Really looking forward the new PR.
Trying to explore the contxtual side of Flair embeddings with a simple example:
Results:
Would've expected much higher scores on the 'Geo context' sentences Am I doing something wrong?