Cache `all_subclasses` to reduce `inspect` cost

Hey :sunflower: We are currently experimenting with using an event broker to also do logging based on tracker contents and found that rasa.shared.utils.common.all_subclasses is having a needlessly large (100%) overhead on our runtime.

Since the parameters variations of these functions are limited to the number of slot and event classes, we can use an LRU cache to cache all calls to this function. It is also very unlikely for the result to change during runtime, so no invalidation is needed.

We think that this will also help overall performance in more common circumstances.

We conducted three tests: a baseline run, using our event broker logging and the event broker logging with the LRU cache around all_subclasses. Sample size was 10 conversations in succession for each experiment.

Results:

baseline: 53s
event broker with logging: 114s
event broker with logging & LRU cache: 59s

As you can see, all_subclasses introduced a ~100% overhead while being essentially a lookup table. It would be nice and quite helpful if this change could be merged.

Proposed changes:

use LRU cache for rasa.shared.utils.common.all_subclasses to reduce overhead cost significantly

Status (please check what you already did):

[x] added some tests for the functionality (current tests cover this)
[x] updated the documentation (not necessary)
[ ] updated the changelog (please check changelog for instructions)
[x] reformat files using black (please check Readme for instructions)

Profiles of the individual experiments

Baseline: Baseline profile

Event broker with logging:

Event broker with logging & LRU cache:

RasaHQ / rasa

Cache `all_subclasses` to reduce `inspect` cost #13044