Closed pommedeterresautee closed 4 years ago
FYI I ran the Imdb model against some opinion based news and I didn't find the results to be all that meaningful.
Yes this model was trained using IMDB data, i.e. film related, so its only a sentiment model for movie reviews. I have to check why the label names changed, very strange!
@alanakbik This is very strange.
I have cleaned $HOME/.flair
folder.
Then I executed that code
from flair.datasets import IMDB, DataLoader
from flair.models import TextClassifier
classifier = TextClassifier.load('en-sentiment')
corpus = IMDB()
sentences = list(corpus.test)
It downloads what it has to.
2019-10-05 23:01:17,042 https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/models-v0.4/classy-imdb-en-rnn-cuda%3A0/imdb-v0.4.pt not found in cache, downloading to /tmp/tmpgxn7mqor
100%|██████████| 1501979561/1501979561 [00:32<00:00, 45600891.88B/s]
2019-10-05 23:01:50,139 copying /tmp/tmpgxn7mqor to cache at /home/.../.flair/models/imdb-v0.4.pt
2019-10-05 23:01:51,112 removing temp file /tmp/tmpgxn7mqor
2019-10-05 23:01:51,227 loading file /home/.../.flair/models/imdb-v0.4.pt
2019-10-05 23:02:00,766 http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz not found in cache, downloading to /tmp/tmpm1n0lup3
100%|██████████| 84125825/84125825 [00:10<00:00, 8308111.70B/s]
2019-10-05 23:02:11,202 copying /tmp/tmpm1n0lup3 to cache at /home/.../.flair/datasets/imdb/aclImdb_v1.tar.gz
2019-10-05 23:02:11,267 removing temp file /tmp/tmpm1n0lup3
2019-10-05 23:02:22,099 Reading data from /home/.../.flair/datasets/imdb
2019-10-05 23:02:22,099 Train: /home/.../.flair/datasets/imdb/train.txt
2019-10-05 23:02:22,099 Dev: None
2019-10-05 23:02:22,099 Test: /home/.../.flair/datasets/imdb/test.txt
Then I do
sentences[0].labels
# Out[12]: [pos (1.0)]
And
a = classifier.predict(sentences=sentences[:100],
mini_batch_size=16,
embedding_storage_mode="none")
a[0].labels
# Out[13]: [POSITIVE (0.9999998807907104)]
So labels ARE different.
Of course, now:
sentences[0].labels
# Out[14]: [POSITIVE (0.9999998807907104)]
as sentence labels are overriden by predict()
call.
first line of /home/.../.flair/datasets/imdb/test.txt
__label__pos Back in 1982 a little film called MAKING LOVE shocked audiences with its frank and open depiction of a romantic love story that just happened to be about two men.<br /><br />I have been waiting for years for a good, old-fashioned romance between two men; LATTER DAYS is all that and more.<br /><br />Yes, it is soapy, melodramatic, cliché-ridden, and quite corny. That is what makes it so wonderful. There is nothing like a good romantic movie, and this movie is romantic in the best sense of the word.<br /><br />As to the issue of religion, sorry folks, but these things do happen and are happening to gay people even now. It is not just the Mormon church that rejects its gay members. Gay people in every religion have faced harsh judgment and rejection.<br /><br />I loved this movie. It has a perfect blend of a fantasy-romance grounded in the reality of the day-to-day lives of the characters. If I could give it more than ten stars I would. Good love stories never go out of style; great love stories like LATTER DAYS are unforgettable.<br /><br />It's about time!
To make it short, label from model is POSITIVE
and label from dataset is pos
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
PR #1545 adds new sentiment datasets and homogenizes labels across datasets. Also, there is now an option to define "name maps" to map label names to other names.
Describe the bug
According to TUTORIAL 2, classification model en-sentiment has been trained on IMDB dataset.
evaluate()
function on en-sentiment model trained on IMDB produces 0 score when tested on test set of IMDB.Reasons seem to be a change in label names.
Model: ['POSITIVE'], ['NEGATIVE'] dataset: ['pos'], ???
I have not a single prediction with negative label. (which is another issue)
IMDB is not marked as deprecated in source code.
To Reproduce
Print:
Expected behavior A score which is not zero
Solution
Retrain / reshare a new en-sentiment model? // redo IMDB dataset
Context PiPy version of Flair / same bug on master branch
Note
Btw, the model seems to not work very well.
"I never saw something that bad." -> positive "I do not like this film" -> positive "I hate this film" -> negative (finally)