RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.91k stars 4.63k forks source link

Improve Domain Loading #10510

Closed carlad closed 2 years ago

carlad commented 2 years ago

What problem are you trying to solve?

This issue comes out of an initial bug reported by @raoulvm , where use_entities was not being set correctly when domain config is split over several domain .yml files.

Investigating this bug led to the realisation of a problem with the current Domain loading implementation in the case of multiple domain files.

Currently, with a split domain, the Domain object is constructed iteratively: with separate Domains are created from each domain file and merged together sequentially. As a step in creating a Domain includes linking the intents in a file with the entities they will use: either True (to use all entities in the entire domain), use_entities (to use specific entities, when these entitites are less than half of all entities), or ignore_entities (when these entities are less than half of all entities), a problem arises when entities are configured in different domain files. Each file does not know about entities in other files, and consequently, as each Domain object is created and merged sequentially, the method to link entities to intents, _transform_intents_for_file, can make mistakes.

What's your suggested solution?

@raoulvm proposed a solution which has been implemented in this PR, but it duplicates code somewhat in a process that couples validation with attribute setting.

I explored the possibility of decoupling the load process by first validating and merging all domain yml files recursively into a single defaultdict(list). Locally I used split domains from a couple of customers to create single dict objects that appeared legit (see below for links to the output).

Passing this dict to the from_dict method, created correct entity/intent association in the Domain, but broke other steps. A PR reflecting these changes, and the breakages, can be found here.

Examples (if relevant)

Example Merged Domain 1

Example Merged Domain 2

(these docs are only visible to Rasa team members)

Definition of Done

This issue was identified using rasa 2.8.x and that's the version I've been debugging against. I would confine any changes to 2.8.x for this issue, then create another for porting the changes to 3.0.x

carlad commented 2 years ago

UPDATE: This work became somewhat of a rabbit-hole that looked like it would require making a lot of changes to the logic. Since most of the team are away on vacation, it seemed wiser to park this for now and instead focus on the specfic bug that prompted this issue: #10137

Moving forward, and possibly as part of the Declarative Project Structure work, it would still be useful to improve how we load the Domain.

Some questions that arose while doing the work here:

raoulvm commented 2 years ago

Hi @carlad - I see you are peeking out of the rabbit hole you went into...
I would add to your list:

Happy holidays! 🎄

carlad commented 2 years ago

Thanks for those additions @raoulvm. I am indeed out of the rabbit hole 😉 Happy holidays to you too! 🎄 ❄️ 🌟