Get rid of model access in `rasa.nlu.test`

wochinge commented 3 years ago

Description of Problem: The rasa.nlu.test module has several paths where it directly access attributes on the processor's pipeline or even lower. This has the disadvantage that the test code is really tied to the specific model architecture. With the revamp we want to reduce these ties. This will help us to keep the interface of Graphprocessor small and explicit and will reduce the amount of breaking things when we start to implement the actual graph.

Related implementation proposal page

Locations where currently access things which are too low level:

remove_pretrained_extractors: We can instead remove the predictions of pretrained entity extractors from the predictions themselves (basically excluding them from the assert)
is_classifier_present: We can e.g. look at a single test prediction to see if an intent was added
is_response_selector_present: We can e.g. look at a single test prediction to see if the response selector added some result
get_available_response_selector_types: See related Slack thread here. We can basically use the same approach as here
is_entity_extractor_present: We can e.g. look at the test data and the test predictions to see what extractors are present (or whether one is present)
[get_entity_extractors]: (https://github.com/RasaHQ/rasa/blob/e1d2b6c4a37f7ed0dda93efbacd386f01556d119/rasa/nlu/test.py#L1332): We can e.g. look at the test data and the test predictions to see what extractors are present (or whether one is present)

Overview of the Solution: I outlined some proposed ways of getting rid of these accesses. Anything else which doesn't require an additional method on the GraphProcessor is also fine.

Examples (if relevant):

Blockers (if relevant):

Definition of Done:

[ ] The described locations no longer inspect or manipulate the interpreter directly
[ ] A researcher has reviewed and approved the changes

twerkmeister commented 3 years ago

@wochinge @joejuzl

I've been thinking a bit about this issue ~ and I would like to discuss an alternative route here...

With regards to point 1: remove_pretrained_extractors I my opinion, this logic should disappear entirely. On the one hand it makes sense to not test those pre-trained models. On the other hand, this logic makes it impossible for users to assess the value of these components in their pipeline. They can't assess the question "What is the effect of adding duckling?" to the pipeline for their datasets right now, since duckling will always be deactivated in the testing. I think it makes much more sense to not force this logic and let people add and remove components to/from their pipeline themselves ~ something that is easily and intuitively done by editing the config file. Also it brings testing setup closer to the actual production system.

EDIT: Ah I remembered that the problem here is that there tends to be no annotations for the pre-trained extractors in the data. That's why it's often not feasible to evaluate them anyway. So, with this part I'll go ahead as planned, removing their predictions from the output. The following question still stands unchanged.

With that point out of the way, none of the others are manipulating the graph. They are simply asking questions about the graph. That seems alright to me - and is something that we might be doing at other places as well. I could turn the logic around as you propose to instead check a processed set of messages for the traces of said components, but it seems just much more straightforward to have a look at the graph than to do this reverse engineering approach. The graph shouldn't be a black box after all

What do you think?

wochinge commented 3 years ago

I could turn the logic around as you propose to instead check a processed set of messages for the traces of said components, but it seems just much more straightforward to have a look at the graph than to do this reverse engineering approach. The graph shouldn't be a black box after all

This was actually our first idea, too. The thing is that the graph doesn't know of what type a certain graph node / component is. So it's not even possible for the graph to provide that information. The recipes could provide that information but the graph itself doesn't really care what graph component a a graph node uses.

Our goal is to run (relatively) arbitrary model architectures with the graph. Hence, I think it's beneficial to expose as few information as possible regarding the graph structure to the outside world to avoid that they tie themselves too much to certain model architectures.

And re to your edge case re empty entities which you mentioned in Slack: Could we also inspect the training data? So if there are no entities in the training data and none in the predictions -> don't evaluate entities (or don't log the result)?

twerkmeister commented 3 years ago

Alright, happy to give this a shot and see if it makes sense to treat the graph more as a black box. I've started with the changes in the PR but am unfortunately running out of time @wochinge. Left a brief note there about the current status

RasaHQ / rasa

Get rid of model access in `rasa.nlu.test` #9136