action_unlikely_intent prediction hides TED errors in failed_test_stories.yml

Rasa version: 2.7.1

Python version: 3.7.10

Operating system (windows, osx, ...): osx

Issue: When running rasa test with a policy ensemble that includes UnexpecTEDIntent policy, if the ensemble fails after an action_unlikely_intent is predicted by UnexpecTEDIntentPolicy, the action will be recorded by rasa test as being correct in the metrics and will not show up in failed_test_stories.yml.

Let’s take an example test story:

- story:
  - intent: greet
  - action: utter_greet

When a partial part of this test story:

  - story:
   - intent: greet

is passed to the ensemble it can result in an action_unlikely_intent being triggered

- story:
  - intent: greet
  - action: action_unlikely_intent

Since, the last action triggered by ensemble was action_unlikely_intent, the ensemble should be queried again to see if other policies can actually predict utter_greet. So, the following story:

- story:
  - intent: greet
  - action: action_unlikely_intent

should be fed again to the ensemble. Now there are two cases possible:

Case 1: utter_greet gets predicted. So, the story looks like this -

- story:
  - intent: greet
  - action: action_unlikely_intent
  - action: utter_greet

This story should go in stories_with_warnings.yml, as the expected action was correctly predicted after action_unlikely_intent.

Case 2: utter_greet does not get predicted but some other action gets predicted. So, the story looks like this -

- story:
  - intent: greet
  - action: action_unlikely_intent
  - action: some_other_action

This story should end up in failed_test_stories.yml because some_other_action is predicted instead of utter_greet Right now, case (2) is not being checked and can be reproduced with this example ….

# Make sure you have checked out the intent-ted branch on your rasa install
git clone https://github.com/rasahq/ited-tolerance-experiments
cd ited-tolerance-experiments

# Train a ted only ensemble
rasa train core -c ted-config.yml -s dataset1/ --out models/ted
# Train a ted+unexpecTEDIntentpolicy ensemble
rasa train core -c ited-config.yml -s dataset1/ --out models/ited

Run rasa test on the TED only ensemble

rasa test core -s test-bug/ -m models/ted --out results/ted

and you should see the following results:

...
2021-07-06 16:40:34 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2021-07-06 16:40:34 INFO     rasa.core.test  -  Correct:          0 / 1
2021-07-06 16:40:34 INFO     rasa.core.test  -  Accuracy:         0.000
...
2021-07-06 16:40:36 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2021-07-06 16:40:36 INFO     rasa.core.test  -  Correct:          10 / 11
2021-07-06 16:40:36 INFO     rasa.core.test  -  F1-Score:         0.879
2021-07-06 16:40:36 INFO     rasa.core.test  -  Precision:        0.864
2021-07-06 16:40:36 INFO     rasa.core.test  -  Accuracy:         0.909
2021-07-06 16:40:36 INFO     rasa.core.test  -  In-data fraction: 0.455
...

which are correct.

Run the ensemble with UnexpecTEDIntentPolicy

rasa test core -s test-bug/ -m models/ited --out results/ited

and you get:

...
2021-07-06 16:41:36 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2021-07-06 16:41:36 INFO     rasa.core.test  -  Correct:          1 / 1
2021-07-06 16:41:36 INFO     rasa.core.test  -  Accuracy:         1.000
...
2021-07-06 16:41:37 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2021-07-06 16:41:37 INFO     rasa.core.test  -  Correct:          10 / 10
2021-07-06 16:41:37 INFO     rasa.core.test  -  F1-Score:         1.000
2021-07-06 16:41:37 INFO     rasa.core.test  -  Precision:        1.000
2021-07-06 16:41:37 INFO     rasa.core.test  -  Accuracy:         1.000
2021-07-06 16:41:37 INFO     rasa.core.test  -  In-data fraction: 0.5
...

Looking in the failed test stories you can see TED's error when it's run in isolation results/ted/failed_test_stories.yml:

...
  - action: utter_flight_available
  - intent: deny
  - action: utter_anything_else
  - intent: deny
  - action: utter_goodbye  # predicted: utter_anything_else

However, the corresponding file for the combined TED+UnexpecTEDIntentPolicy (results/ited/failed_test_stories.yml) is empty while the action_unlikely_intent shows up in results/ited/stories_with_warnings.yml.

WARNING: This does not change the case where action_unlikely_intent was actually expected to be triggered at a conversation turn but was not predicted by the ensemble. Such cases should still go to failed_test_stories.yml.

RasaHQ / rasa

action_unlikely_intent prediction hides TED errors in failed_test_stories.yml #9057