RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.92k stars 4.63k forks source link

Lookup Table for featurized tracker messages #9020

Closed twerkmeister closed 3 years ago

twerkmeister commented 3 years ago

Description of Problem: Currently a lot of duplicate computing is done featurizing messages in trackers multiple times. There are two kinds of duplication:

  1. Among the positions of the sliding window across a single conversation
  2. Whenever we have identical messages across conversations

With small to medium-sized datasets this is not an issue. For larger Datasets such as Multiwoz, this duplication adds almost an hour of additional preprocessing time.

Overview of the Solution: Featurize each unique message once and store the result to be used downstream by other components.

Inside the v3 architecture prototype is a prototypical implementation of this feature. There is also a necessary, but so far unmerged fix

I have extracted the code from the prototype before to run tests on the current architecture and the latest version can also be found in the combined-e2e-fixes branch.

This feature would also unlock batch encoding during training, which would be too computationally expensive without having the features cached in the lookup table beforehand.

Open Issues:

Definition of Done:

ka-bu commented 3 years ago

closed via #9405