Decouple confidence normalization from `ranking_length` parameter

RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

https://rasa.com/docs/rasa/

Apache License 2.0

18.81k stars 4.62k forks source link

Decouple confidence normalization from `ranking_length` parameter #9642

Closed dakshvar22 closed 2 years ago

dakshvar22 commented 3 years ago

What problem are you trying to solve?

Currently, all ML components (DIETClassifier, ResponseSelector, TEDPolicy) have a parameter called ranking_length which is supposed to be used when the user does not want to see the confidences of all possible candidate labels (intents / actions) and instead want to see a smaller number of labels.

However, what happens by default is that model's predicted confidences for the labels inside the ranking length are renormalized to sum up to 1. This is wrong because this can artificially boost the confidences of labels and hides the original confidences that were predicted by the model itself. Renormalization step should be decoupled from this parameter.

Removing the renormalization step is also not possible because if a user has a lot of intents, say 200, all the predicted confidences tend to be very low which is problematic for fallback. So, it should be done only when needed, i.e. the user sees a benefit of it. Anecdotal evidence of this happening with customers

What's your suggested solution?

Have a separate parameter, something like - renormalize_confidences which if set to True applies the renormalization.

Note that the re-normalization step is currently only applied when loss_type is set to cross_entropy and that condition should still stay.

Examples (if relevant)

No response

Is anything blocking this from being implemented? (if relevant)

No response

Definition of Done

[ ] Renormalization logic decoupled from ranking_length parameter.
[ ] New parameter added for renormalization (decide on the name if it the above suggestion doesn't seem appropriate)
[ ] Changelog and migration guide entry
[ ] Docs
[ ] Tests

dakshvar22 commented 3 years ago

@TyDunn Just bringing this issue to your notice. This coupling between ranking_length and the renomalization step has come up quite often as causing problems as to how users (some customers) interpret confidences. I think we should decouple this logic with 3.0 itself because the right thing to do is to not apply this renormalization ideally but for practical reasons if some users find it useful we'll need to support it. Doing this in a minor will mean that we can't avoid renormalization by default.

ka-bu commented 3 years ago

@dakshvar22 ,

We have that:

TED is only using the ranking_length for the purpose of normalisation (i.e. there seems to be no usage of that for the purpose of pruning predicted actions for predictions, or am I missing something?)
DIETClassifier and ResponseSelector use it for two things:
- the number of reported top confidences is min(ranking_length, LABEL_RANKING) if ranking_length is defined - otherwise it's fixed to LABEL_RANKING
- the normalisation over the ranking_length many items
- which leads to the funny situation that the reported confidences might not sum up to 1 if ranking_length > LABEL_RANKING (cf. "test_softmax_normalization" case commented with "higher than default ranking_length" which does not sum up to 1 because of this)

So how about we ...

in DIETClassifier/ResponseSelector use
- LABEL_RANKING as default for ranking_length
- report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
- re-normalise all reported confidences if and only if the new renormalize_confidences flag is set
- (this breaks what people expected - but that was supposed to happen anyway)
in TED we
- if ranking_length > 0, then always set all confidences that are not among the top ranking_length many to 0 (this is the only possible analog to "reporting only x many" I think) (which is kept at it's default of 10)
- re-normalise all confidences if and only if the new renormalize_confidences flag is set (which is True by default)
- (with the default of the new parameter being True, no old config should break)

wdyt?

ka-bu commented 3 years ago

Oh, similarly, we could parameterize sklearn intent classifier - that one will always report confidences for top 10 intents (if there are that many)

dakshvar22 commented 3 years ago

TED is only using the ranking_length for the purpose of normalisation

Probably a mistake, should be fixed.

So how about we ...

I agree with everything but lets keep the behaviour universal for all 3 components, i.e.:

LABEL_RANKING as default for ranking_length
report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
re-normalise all reported confidences if and only if the new renormalize_confidences flag is set
By default, renormalize_confidences is set to False.

If people want to get the old behaviour they can set renormalize_confidences=True. Wdyt?

ka-bu commented 3 years ago

LABEL_RANKING as default for ranking_length

👍 wrote 10 above because LABEL_RANKING is from nlu.classifiers - will move this parameter to some other constants.py then 🤔

report ranking_length many top confidences if ranking_length>0, and otherwise all confidences

This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).

By default, renormalize_confidences is set to False.

Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour :D

dakshvar22 commented 3 years ago

This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).

Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?

Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour

This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅

ka-bu commented 3 years ago

Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?

I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0 (in TED).

Btw, there's also a tiny bug in the normalisation new_values[new_values < ranked[ranking_length - 1]] = 0 that'll probably close to never happen (only in case we have multiple ranks with same probability that could be ranked as the ranking_lengths item) but I'll still fix it along with the rest :)

This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅

👍

dakshvar22 commented 3 years ago

I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0

Totally my bad, don't know what I was thinking

ka-bu commented 3 years ago

No worries, happens to me all the time ^^. Btw, actually there's no reason to mask the probabilities in TED, is there? The ensemble won't care about the values that we are setting to 0. So maybe we set the default to 0 instead of LABEL_RANKING_LENGTH ?