Lookup Table for featurized tracker messages

Description of Problem: Currently a lot of duplicate computing is done featurizing messages in trackers multiple times. There are two kinds of duplication:

Among the positions of the sliding window across a single conversation
Whenever we have identical messages across conversations

With small to medium-sized datasets this is not an issue. For larger Datasets such as Multiwoz, this duplication adds almost an hour of additional preprocessing time.

Overview of the Solution: Featurize each unique message once and store the result to be used downstream by other components.

Inside the v3 architecture prototype is a prototypical implementation of this feature. There is also a necessary, but so far unmerged fix

I have extracted the code from the prototype before to run tests on the current architecture and the latest version can also be found in the combined-e2e-fixes branch.

This feature would also unlock batch encoding during training, which would be too computationally expensive without having the features cached in the lookup table beforehand.

Open Issues:

How to solve for inference is still marked as TODO in the current v3 architecture prototype
- https://github.com/RasaHQ/rasa/blob/b359b4cea336444f0ccc0c421915aacb08d6144b/rasa/core/featurizers/single_state_featurizer.py#L226
Entity encoding is still problematic in hte current v3 architecture prototype
- https://github.com/RasaHQ/rasa/blob/d81237660b6fc82eee2861d0e7a438c1487c23c1/rasa/core/featurizers/single_state_featurizer.py#L319-L324
- If you just use an empty interpreter, featurize_message returns None resulting in no entity info

Definition of Done:

[ ] existing code is integrated into the new architecture
[ ] open issues are addressed
[ ] Tests are added
[ ] Feature mentioned in the changelog

RasaHQ / rasa

Lookup Table for featurized tracker messages #9020