flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

Contextual embedding don't seem to work #791

Closed elie-h closed 5 years ago

elie-h commented 5 years ago

Trying to explore the contxtual side of Flair embeddings with a simple example:

# your query
query = 'The capital of Washington'

# some texts
sentences = [
    'George Washington addressed his supporters',
    'Taking a flight to Washington tonight',
    'Arkansaw is a lovely state',
    'George Washington was a great president',
]

# first, declare how you want to embed
embeddings = DocumentPoolEmbeddings([FlairEmbeddings('news-forward'), 
                                     FlairEmbeddings('news-backward')
                                    ])

# embed
q = Sentence(query)
embeddings.embed(q)

# use cosine distance
cos = torch.nn.CosineSimilarity(dim=0, eps=1e-6)

for sentence in sentences:
    s = Sentence(sentence)
    embeddings.embed(s)
    prox = cos(q.embedding, s.embedding)
    print(query, ' - ', sentence, ' - ', prox)

Results:

The capital of Washington  -  George Washington addressed his supporters  -  0.3869
The capital of Washington  -  Taking a flight to Washington tonight  -  0.4389
The capital of Washington  -  Arkansaw is a lovely state  -  0.3746
The capital of Washington  -  George Washington was a great president  -  0.3629

Would've expected much higher scores on the 'Geo context' sentences Am I doing something wrong?

stefan-it commented 5 years ago

Technically, the code looks good. Here are some other comparisons with BERT and ELMo:

LM Sentence Similarity
BERT (bert-base-uncased) George Washington addressed his supporters 0.6652
BERT (bert-base-uncased) Taking a flight to Washington tonight 0.6186
BERT (bert-base-uncased) Arkansaw is a lovely state 0.5656
BERT (bert-base-uncased) George Washington was a great president 0.6955
BERT (bert-base-cased) George Washington addressed his supporters 0.8641
BERT (bert-base-cased) Taking a flight to Washington tonight 0.8477
BERT (bert-base-cased) Arkansaw is a lovely state 0.8385
BERT (bert-base-cased) George Washington was a great president 0.8622
BERT (bert-large-uncased) George Washington addressed his supporters 0.7823
BERT (bert-large-uncased) Taking a flight to Washington tonight 0.7476
BERT (bert-large-uncased) Arkansaw is a lovely state 0.7185
BERT (bert-large-uncased) George Washington was a great president 0.8058
BERT (bert-large-cased) George Washington addressed his supporters 0.8190
BERT (bert-large-cased) Taking a flight to Washington tonight 0.7761
BERT (bert-large-cased) Arkansaw is a lovely state 0.7934
BERT (bert-large-cased) George Washington was a great president 0.8424
ELMo George Washington addressed his supporters 0.3986
ELMo Taking a flight to Washington tonight 0.4577
ELMo Arkansaw is a lovely state 0.3902
ELMo George Washington was a great president 0.3886
GPT-1 George Washington addressed his supporters 0.8232
GPT-1 Taking a flight to Washington tonight 0.8396
GPT-1 Arkansaw is a lovely state 0.7307
GPT-1 George Washington was a great president 0.8003
Transformer-XL George Washington addressed his supporters 0.2481
Transformer-XL Taking a flight to Washington tonight 0.1841
Transformer-XL Arkansaw is a lovely state 0.3009
Transformer-XL George Washington was a great president 0.2997
stefan-it commented 5 years ago

ELMo looks quite similar to the result with Flair Embeddings :)

Hellisotherpeople commented 5 years ago

Spelling Arkansas correctly may help the model realize that it's a geolocation.

Also, given the 4 sentences, Flair correctly ranks the "Taking a flight to Washington tonight " as the most important, so I don't see the problem. Maybe you'd like the difference in similarity to be higher.

I'd like to see how TransformerXL and GPT-2 do on this, and maybe even word2vec / fasttext

elie-h commented 5 years ago

@stefan-it - thanks for the table that's quite interesting.

@Hellisotherpeople Arkansaw is a town in Wisconsin. Would expect it to pick up on that as geolocation too.

Sorry just realised I said it's a lovely state in the example - I see how that's misleading.

stefan-it commented 5 years ago

As requested, I added the scores for GPt-1 and Transformer-XL 🤗

elie-h commented 5 years ago

Thanks @stefan-it

Tried some other examples with Flair - these actually work well:

the bucket and mop are in the closet  -  he kicked the bucket  -  0.5848
the bucket and mop are in the closet  -  i have yet to cross-off all the items on my bucket list  - 0.5263
the bucket and mop are in the closet  -  the bucket was filled with water  - 0.6970
he is currently resting at home  -  the dog sleeps in the kennel  - 0.4730
he is currently resting at home  -  he lived in a beautiful mansion  - 0.5347
he is currently resting at home  -  the home office issued penalties for late filing  - 0.4030
he is currently resting at home  -  press the home button on your phone  -  0.3302

Anyone have any further insight or ideas? If not I'll close this out later on

alanakbik commented 5 years ago

Hello @eliehamouche @stefan-it thanks for sharing these results!

Another idea would be not to use the cosine of document vectors as a measure of similarity, but different measures that get document similarity based on word embeddings. An example of this would be the word mover's distance: Like document pool embeddings, it need not be trained so it can be used without supervision. We don't yet have it in Flair, but I think it's probably not difficult to implement and experiment with. It might be interesting to see how well word mover's distance works with different types of contextualized word embeddings.

elie-h commented 5 years ago

Hey @alanakbik - sorry for the delay missed the notification

That looks quite interesting actually, I'll do a quick comparison and revert back.

songtaoshi commented 5 years ago

@alanakbik @eliehamouche Hello, really thank you provide the transformerXL embedding, but I have a question, if i train my own transformerXL embedding, it seems that it cannot be integrated into the embedding in Flair, such as elmo, I just provide the option_file and the weight_file.

stefan-it commented 5 years ago

@songtaoshi I will push a follow-up PR for passing custom models into the newly added embeddings very soon (I've also trained a few XLNet models) :)

songtaoshi commented 5 years ago

@stefan-it Wow great !!!!! thanks for your replying. Really looking forward the new PR.