Closed dakshvar22 closed 2 years ago
@TyDunn Just bringing this issue to your notice. This coupling between ranking_length
and the renomalization step has come up quite often as causing problems as to how users (some customers) interpret confidences.
I think we should decouple this logic with 3.0 itself because the right thing to do is to not apply this renormalization ideally but for practical reasons if some users find it useful we'll need to support it. Doing this in a minor will mean that we can't avoid renormalization by default.
@dakshvar22 ,
We have that:
TED
is only using the ranking_length
for the purpose of normalisation (i.e. there seems to be no usage of that for the purpose of pruning predicted actions for predictions, or am I missing something?)DIETClassifier
and ResponseSelector
use it for two things:
min(ranking_length, LABEL_RANKING)
if ranking_length is defined - otherwise it's fixed to LABEL_RANKING
ranking_length
many itemsranking_length > LABEL_RANKING
(cf. "test_softmax_normalization" case commented with "higher than default ranking_length" which does not sum up to 1 because of this)So how about we ...
DIETClassifier
/ResponseSelector
use
LABEL_RANKING
as default for ranking_length
ranking_length
many top confidences if ranking_length
>0, and otherwise all confidencesrenormalize_confidences
flag is setTED
we
ranking_length
> 0, then always set all confidences that are not among the top ranking_length
many to 0 (this is the only possible analog to "reporting only x many" I think) (which is kept at it's default of 10
)renormalize_confidences
flag is set (which is True
by default)True
, no old config should break)wdyt?
Oh, similarly, we could parameterize sklearn intent classifier - that one will always report confidences for top 10 intents (if there are that many)
TED is only using the ranking_length for the purpose of normalisation
Probably a mistake, should be fixed.
So how about we ...
I agree with everything but lets keep the behaviour universal for all 3 components, i.e.:
LABEL_RANKING
as default for ranking_length
ranking_length
many top confidences if ranking_length>0
, and otherwise all confidencesrenormalize_confidences
flag is setrenormalize_confidences
is set to False
.If people want to get the old behaviour they can set renormalize_confidences=True
.
Wdyt?
- LABEL_RANKING as default for ranking_length
👍 wrote 10 above because LABEL_RANKING
is from nlu.classifiers
- will move this parameter to some other constants.py then 🤔
- report ranking_length many top confidences if ranking_length>0, and otherwise all confidences
This isn't possible for TED
. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).
- By default, renormalize_confidences is set to False.
Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour :D
This isn't possible for TED. A policy prediction must contain values for all predictions. Think the closest we can get to consistent behaviour is the setting-to-0 approach (and not normalising).
Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?
Wasn't sure if it's ok to change that one, I'm fine with breaking things to unify the behaviour
This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅
Ahh right, forgot about it. So, in the setting-to-0 approach, the confidences that have been set to 0 don't contribute to renormalization (if renormalization was turned on), right?
I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0 (in TED).
Btw, there's also a tiny bug in the normalisation new_values[new_values < ranked[ranking_length - 1]] = 0
that'll probably close to never happen (only in case we have multiple ranks with same probability that could be ranked as the ranking_lengths
item) but I'll still fix it along with the rest :)
This is exactly why this needs to be done in 3.0 . Renormalization is completely wrong IMO but if it floats the boat for some users, we should support it, but just not the default behaviour 😅
👍
I'm not sure what you mean by they don't contribute to renormalisation. We normalise by the sum of the confidences. If we remove them from the list (in DIET), they don't contribute to the sum, just like they don't contribute to the sum if we set them to 0
Totally my bad, don't know what I was thinking
No worries, happens to me all the time ^^. Btw, actually there's no reason to mask the probabilities in TED
, is there? The ensemble won't care about the values that we are setting to 0. So maybe we set the default to 0 instead of LABEL_RANKING_LENGTH
?
What problem are you trying to solve?
Currently, all ML components (
DIETClassifier
,ResponseSelector
,TEDPolicy
) have a parameter calledranking_length
which is supposed to be used when the user does not want to see the confidences of all possible candidate labels (intents / actions) and instead want to see a smaller number of labels.However, what happens by default is that model's predicted confidences for the labels inside the ranking length are renormalized to sum up to 1. This is wrong because this can artificially boost the confidences of labels and hides the original confidences that were predicted by the model itself. Renormalization step should be decoupled from this parameter.
Removing the renormalization step is also not possible because if a user has a lot of intents, say 200, all the predicted confidences tend to be very low which is problematic for fallback. So, it should be done only when needed, i.e. the user sees a benefit of it. Anecdotal evidence of this happening with customers
What's your suggested solution?
Have a separate parameter, something like -
renormalize_confidences
which if set toTrue
applies the renormalization.Note that the re-normalization step is currently only applied when
loss_type
is set tocross_entropy
and that condition should still stay.Examples (if relevant)
No response
Is anything blocking this from being implemented? (if relevant)
No response
Definition of Done
ranking_length
parameter.