Wrong classification for words the model has never seen?

kuriakinzeng commented 6 years ago

I'm having difficulty to understand why my model behaves in a certain way:

My models are trained with a big but quite homogeneous dataset such as "give me some advice" (intent: advice) or "tell me a joke" (intent: joke). The trained model works very well for similar queries. However, when seeing new phrases and/or words such as "apple" or "banana" that are obviously neither advice or joke, they still get classified as either advice or joke intent with a very high confidence (>85%).

Q: Any idea why this is happening? And do you have any advice how I can do this better?

What I have tried:

I have trained a fallback intent to include some of the wrong ways to get advice or joke intent. I understand that I could also add "apple" or "banana" to the list of things that should be classified as a fallback intent. However, to include all possible English words doesn't sound like a smart strategy.
Also might be relevant is that my dataset consists of some 20,000 examples for each advice and joke intent, and the dataset for fallback intent is only around 20 examples. The imbalance in the data might contribute to the issue.

wrathagom commented 6 years ago

Can you provide your pipeline? Also, do you know which spacy model you are using (if English small, medium, or large).

As for point 1 the goal isn't to add in every wording the english language, but rather just enough to pull confidence away form the other 2 intents.

Do you only have the 2 intents? 20k training examples sounds like a lot, maybe even too many. Do you need those for entity training or there really are 20k unique ways to ask the question?

kuriakinzeng commented 6 years ago

Thanks for the prompt response. I am using medium spacy model. Pipeline-wise, I didn’t specify one—is there a default?

I have 5 other intents with small dataset (e.g. greet, bye, fallback).

I used a dataset generator; the 20k includes questions of the same few formats and vary only by the entity, e.g. “tell me a joke about school” and “tell me a joke about life” where “school” and “life” are the entities.

ctrado18 commented 6 years ago

Hey,

I am experiencing same issues with the german small model. Maybe your reason is that you migth have too many examples, so your ner is overfitting with new words? My ner detects now very well unseen new words. I trained my NLU with just 20 different entities in various sentence structures. But I face the problem that it fails sometimes with longer words which are made up of different words like apfelbaumgarten in german (garden of apple trees):-)

kuriakinzeng commented 6 years ago

Thanks @ctrado18

Which of the following are you doing?

Find every combination of sentence structures and entities. So if you have 3 different sentence structures and 4 entities, you have 12 combinations.
Randomly combine sentence structures with entities. So if you have 3 different sentence structures and 4 entities, you will end up with maybe 3-6 combinations?

Can I look at your training data by any chance? No obligations though :)

ctrado18 commented 6 years ago

I use the first one.

I tested again and find something strange. What might be the reason or is is just because of low training data.

Some words are get misclassified whereas others are classified very good! I use a sentence which was trained on and plug in a not trained entity which is not recognized for some words! Does this relies on the spacy model or do I have tro train more and use more entities?

kuriakinzeng commented 6 years ago

@wrathagom Here's what I used as the pipeline: "pipeline": "spacy_sklearn" I saw quite a few pipelines available and one can even design a custom one. Any advice on how I should choose a pipeline?

wrathagom commented 6 years ago

@kuriakinzeng I have a strong suspicion that @ctrado18 is right on the overfitting and unbalanced intents causing problems.

Generally speaking you don't need very many examples to train intents (especially simple ones with few variants), but can need thousands of examples to train entities. It may be useful for you in your training data to remove the intent label from the vast majority of your training data. If the intent label isn't present then the example will still be used to train the entity recognizer, but wont influence the intent classifier. In this way you could balance the intents...

As far as pipelines, I would stick to the default until it doesn't do something you want. (like duckling)

kuriakinzeng commented 6 years ago

@wrathagom That's a great suggestion. Thank you. Let me try that and report back my results :)

ctrado18 commented 6 years ago

@wrathagom So, for entities you need very more examples. So is it just norrmal that the entity isnt extracted though you use the same sentence structure as it is trained with?

wrathagom commented 6 years ago

Yes that makes sense because of how the generalization and training work.

@kuriakinzeng I am closing for now, but please re-open if my suggestion doesn't help!

kuriakinzeng commented 6 years ago

@wrathagom I tried a smaller and hopefully a balanced dataset available here

Now, instead of being classified under "joke" (chuckNorris in this case) or "advice" intent, it is classified as "greet." Do you have any idea why this is happening?

Perhaps unrelatedly, why is SVM a popular choice for intent classification? Rasa also uses SVM right?

Thanks so much!

kuriakinzeng commented 6 years ago

@wrathagom I can't re-open this issue. Can you help?

akelad commented 6 years ago

@kuriakinzeng that link doesn't work - can you share your training data a different way? As for the SVM - we use it because it's a simple classifier that we get the best results with. We've tried other classifiers like NNs, with which there's not much improvement

kuriakinzeng commented 6 years ago

Thanks for looking into it @akelad I have edited my comment above to provide the correct link :)

akelad commented 6 years ago

@ctrado18 idk if you've figured this out yourself yet - but we don't recommend you use the small german spacy model. I know spacy doesn't provide a larger one with spacy 2.0, so I'd suggest downgrading spacy to 1.8.x @kuriakinzeng thanks, i'll get round to looking at it at some point today

akelad commented 6 years ago

@kuriakinzeng i can't see anything obviously wrong with your data, apart from that it's a bit unbalanced (chuckNorris intent has only 13 examples, greet double that). I'd say try running the evaluate script on your data and see where the confusion is happening. and then balance out the data a bit if that shows a lot of intent confusion. Also - are you using the medium spacy model?

kuriakinzeng commented 6 years ago

Thanks @akelad

Should I run evaluate on the same dataset?
I'm using the medium spacy model indeed.

wrathagom commented 6 years ago

Yes to running it with the same dataset. I find it fascinating that you've combined all of my tutorials into a single training set! And terrifying at the same time! 😆

I'm testing it now.

wrathagom commented 6 years ago

@kuriakinzeng I think it worked though look at the confidence of a request like I want a joke vs. apple. In my case, it was .68 vs .44 if you implemented a fallback threshold of .5 you would have something to key off of that the user was asking for an unknown intent.

	precision	recall	f1-score	support
advice	1.00	0.94	0.97	16
chuckNorris	1.00	1.00	1.00	13
fallback	0.94	1.00	1.00	15
goodbye	1.00	1.00	1.00	38
greet	1.00	1.00	1.00	22
marriageProposal	1.00	1.00	1.00	8
avg / total	0.99	0.99	0.99	117

The summary is that this is an interactive approach. And things like feedback loops are essential to fine-tune the end experience.

I would double the number of examples of the advice, chuckNorris, and fallback intents and try again. Let me know what other problems you are seeing.

ctrado18 commented 6 years ago

@akelad thanks. I have not thought about downgrading spacy. I thought they are only small german sets. I struggle to find out the reason that some words (which may not be inside the small data set) are not recognized as entities, although the same sentence structure is used as for training? Can you share your ideas about that? Do I have to add more entity examples? Although I just use 20 different entity examples the NLU detects well unseen entities, but some not...Those untrained are somehow custom words. But I thought that the NLU is able to detect unseen words as entities although they are not in the spacy language model?! Otherwise I would have to use a phrase Matcher for my custom words and a NER comonent would be sensless?

Like sentences: I need a cake and then tested on I need a cheescake where cheescake (not part of training) is not recognized.

Thank you! And btw you guys make a great job!!

wrathagom commented 6 years ago

The models that @akelad mentioned should only impact intent classification. Entities (when using the CRF) don't depend on the model and as you said are more based on sentence structure and text features.

What's an example of a sentence that worked and one that didn't?

ctrado18 commented 6 years ago

@wrathagom Yes, my intent classificarion works fine. It is only about my entities. It is like a type of a sentence I wrote above, but in german. I have not yet run the evualtion script. Maybe I can find out more about the CRF and what is going on?

wrathagom commented 6 years ago

The docs for the underlying library are here: https://sklearn-crfsuite.readthedocs.io/en/latest/ You can glean some information by looking at the features the Rasa CRF uses: https://sklearn-crfsuite.readthedocs.io/en/latest/

ctrado18 commented 6 years ago

@wrathagom In gitter you have said that words like nop may not have a vector representation in spacy model, therefore might not be classified. Maybe you refered ther to an other NER than CRF, right? CRF should also work with unknown words because it does not use a vector representation? Although I use german it should not matter for entity recognition which model I use? So I have to give more examples? I definetly will try the evaulation script first now. Afterwards I will try the new NLU with tensorflow. Than you have no entities and I am really excited how this will be better then! 😀

Do you have experienced also this issue with english and crf? This you should have since spacy model plays no important role. Can you give some tips? Do I have to give just more examples?

rithwikjc commented 6 years ago

Hi. Not exactly adding to the topic, but I had a question. @wrathagom

It may be useful for you in your training data to remove the intent label from the vast majority of your training data.

In the JSON format I see that you can just leave the intent field of the dictionary empty. But how would one do that in the markdown format?

## intent:
- hi
- hello
- hey

Leaving intent empty as shown above leads to it being put into the regex_features in the training_data.json file that is created in the model. I was trying to look at code and figure out how the .md are converted to .json but couldn't figure it out.

kuriakinzeng commented 6 years ago

@wrathagom @akelad Thanks for the effort! Although I can put a threshold of 0.5 to move 'apple' from joke intent to an unknown intent, I wonder why that is happening? Further, when I try 'orange' it returns 0.88 confidence that it is a greet intent. It seems strange because I didn't train these terms. Is Rasa/Spacy using a dictionary or word2vec that could explain the relevance of orange to other words in my dataset?

akelad commented 6 years ago

@rithwikjc you really don't want to leave your intent name empty. If you're trying to train an intent that has examples that aren't relevant to your bot, then call it something like out_of_scope

akelad commented 6 years ago

@kuriakinzeng yeah so the sentence representation that is used is an average of word vectors of words in the sentence, and then the sentence is classified using an SVM.

ctrado18 commented 6 years ago

@akelad If I use a german sentence with words that have no vector representation like custom words, does this influence my intent classification? How is this handled in versions before 12.x? Are they just get the value null?

Also with NLU version 12.2 sentence with like How much costs a new bike get recognized as a whole entity new bike. but this might reliy on that I have not trained with adjectives in front of the word? I justntrained with sentence like How much costs a glass?

Also, If I train just singular words in my examples and use it with test sentences containg the plural of that word, where the difference is just one letter at the end !, it fails recognizing it. The singular word is recognized although! Why is that now? My NLU detects now very well unseen words, but just the singular form like I trained with and not not the plural form...

NLU is very sensitiv to things like that when you have not trained it with such occassions. One way to think about it is tro write a action which splits the entity in tokens and just ckeck if one of the tokens matches the right entity.

rithwikjc commented 6 years ago

@akelad What I meant was in JSON format training data the intent is an optional field. So @wrathagom was saying it was possible to include examples with out intent for training the entity extractor. I was wondering what is the equivalent for that in Markdown format training data, i.e, for writing examples for entity extraction without providing an intent.

akelad commented 6 years ago

@ctrado18 yes it does influence the intent classification - this also hasn't changed in version 12 if you're using spacy sklearn. if there are enough other relevant words in the sentence however, the sentence should still get classified correctly.

Yes it's most likely to do with your training data.

Is this a problem with entity recognition or intent classification? because if it's for entities, if there's just a single word, there's no context and so the model will sometimes fail to extract words it hasn't seen before. if you have single words a lot you can always use the whole message text as a fallback when an entity isn't recognised.

@rithwikjc we've deprecated just providing entity examples seperately

ctrado18 commented 6 years ago

@akelad I meant with singular the opposite of the plural form of the word like bike vs bikes. That is why I opened another issue with single words as user input like user types just bike. And there is no context I would like to have it as an entity though. So would it be a good idea to train crf with sentences and single words together?

. if you have single words a lot you can always use the whole message text as a fallback when an entity isn't recognised.

I don't understand this statement.

ctrado18 commented 6 years ago

@akelad How can i find out if specific words are inside the spacy model? Like the verb kaputt in mein Fahrrad ist kaputt, so I can use an intent problem instead of the entity problem.

amn41 commented 6 years ago

you can use the has_vector method: https://spacy.io/api/token#has_vector

ctrado18 commented 6 years ago

@amn41 does it matters for CRF if the word has a representation? Do you have to care abut all complex grammar stuctures when you use a language like german? Also just for the plural form it gets tricky because the verb form depends on it. e.g. my NLU fails to detect: was kosten gehilfen, but in singular form it works was kostet gehilfe. so the form of the verb kosten is important. Do you have to take care all about this?

ctrado18 commented 6 years ago

I think you don't need to train all verb forms as they ave the same postag? Both verbs in the german sentence have the same postag: was kostet gehilfe and was kosten gehilfen.

But I found something strange. Most of my Nouns are recognised as Verbs or ADJ?


import spacy

nlp = spacy.load('de_core_news_sm')
doc = nlp(u'was kostet eine gehilfe')

for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_)

I get:

was wer PRON PWS sb
kostet kosten VERB VVFIN ROOT
eine einen DET ART nk
gehilfe gehilfe ADJ ADJA mo

what can I do about that? How can you build a german chatbt with this inaccuracies?

ctrado18 commented 6 years ago

I discovered something very strange. I am developing an own lemmatizer and therefore I needed an own POS Tagger using TIGER for german. After all, I found that the POS-Tagger and also the POS Tagger from spacy are case sensitiv! That's why the above noun gehilfe is not tagged as a noun but for Gehilfe it is...Hasn't anyone thought of that? That should be fixed with spacy, right? Since this is very bad!

I don't see where I can fix that?

akelad commented 6 years ago

I mean, capitalization kind of does matter in German when it comes to nouns, given they're all capitalized. When they're not, it's hard to tell whether it's a noun/verb sometimes But these POS tags shouldn't really cause too much of an issue, given this isn't the only thing ner_crf pays attention to. I would just make sure you have enough examples in your training data. I've build a chatbot in German before and haven't had huge problems with this.

How many examples per entity/intent do you have in your data? And have you run the evaluate script on your data? Also, as for "if you have single words a lot you can always use the whole message text as a fallback when an entity isn't recognised." - if users respond to questions with single words a lot, you can write a custom action that checks whether an entity was recognised, and if not just grab the whole text they used.

ctrado18 commented 6 years ago

@akelad Thank you! But mostly user write message just with lower cases! Thats why nouns are get mistagged. I tested it and added more examples where they get misstaged such that it works for those cases. But without you would need fewer training data! As I am working for a chatbot it makes sense to fix that? But I don't know how to get off this feature? Could you do that in rasa? I have not much experience with the spacy architecture.

Right now i have 1 entity and 1 intent and have about few hundred sentences with about 30 entity value examples.

Now I understand what you have meant. Isn't that also a good solution handling single words as input at all? You check if no entity is recognised then you check length of message. If it has length of just one token you grap this message! Sounds very good?

What I thought of too is to do a fallback strategy if no entity is found at all (single words or whole sentences). Is it possible to call a pipeline like a phrase matcher conditonally in a custom action? Or generally to to a pipleine on a specific condition not under any circumstances.

wrathagom commented 6 years ago

@kuriakinzeng did you get your question answered? This thread has gotten a bit out of control 💥

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.

RasaHQ / rasa

Wrong classification for words the model has never seen? #1010