Closed carlad closed 2 years ago
UPDATE: This work became somewhat of a rabbit-hole that looked like it would require making a lot of changes to the logic. Since most of the team are away on vacation, it seemed wiser to park this for now and instead focus on the specfic bug that prompted this issue: #10137
Moving forward, and possibly as part of the Declarative Project Structure work, it would still be useful to improve how we load the Domain.
Some questions that arose while doing the work here:
.yml
domain files into a single defaultdict(list)
) was only able to handle files within the same directory. Should we establish some clear rules about how many levels of domain directories we will support?use-entities
and ignore_entities
on the same intent.dict
was expected, but a list
was received. This is due to merging the .yml
files into a defaultdict(list)
. So any logic change that adopts the approach of this solution needs to be aware of this.common.yml
file. Hi @carlad - I see you are peeking out of the rabbit hole you went into...
I would add to your list:
intents
, entities
, responses
(if not located in data!) and slots
(more?)raise
if elements from those keys are duplicated. Force customers to clean up! No more "overwriting" of data already read. The order of file reads must not affect the outcome.Happy holidays! 🎄
Thanks for those additions @raoulvm. I am indeed out of the rabbit hole 😉 Happy holidays to you too! 🎄 ❄️ 🌟
What problem are you trying to solve?
This issue comes out of an initial bug reported by @raoulvm , where
use_entities
was not being set correctly when domain config is split over several domain.yml
files.Investigating this bug led to the realisation of a problem with the current
Domain
loading implementation in the case of multiple domain files.Currently, with a split domain, the
Domain
object is constructed iteratively: with separateDomain
s are created from each domain file and merged together sequentially. As a step in creating aDomain
includes linking theintents
in a file with theentities
they will use: eitherTrue
(to use all entities in the entire domain),use_entities
(to use specific entities, when these entitites are less than half of all entities), orignore_entities
(when these entities are less than half of all entities), a problem arises when entities are configured in different domain files. Each file does not know about entities in other files, and consequently, as eachDomain
object is created and merged sequentially, the method to linkentities
tointents
,_transform_intents_for_file
, can make mistakes.What's your suggested solution?
@raoulvm proposed a solution which has been implemented in this PR, but it duplicates code somewhat in a process that couples validation with attribute setting.
I explored the possibility of decoupling the load process by first validating and merging all domain
yml
files recursively into a singledefaultdict(list)
. Locally I used split domains from a couple of customers to create single dict objects that appeared legit (see below for links to the output).Passing this dict to the
from_dict
method, created correct entity/intent association in theDomain
, but broke other steps. A PR reflecting these changes, and the breakages, can be found here.Examples (if relevant)
Example Merged Domain 1
Example Merged Domain 2
(these docs are only visible to Rasa team members)
Definition of Done
This issue was identified using rasa 2.8.x and that's the version I've been debugging against. I would confine any changes to 2.8.x for this issue, then create another for porting the changes to 3.0.x