Closed joejuzl closed 2 years ago
Domain merging fails when there is a dictionary entity.
This can be fixed by using the same logic currently used for intents
@joejuzl I would need more details on this bug, do you have an example or open ticket that I can look at? Not sure I understand what dictionary entity
refers to and what the logic used for intents is.
@ancalita
Take this example:
entities:
- GPE:
roles:
- destination
- origin
- name
The entity name
is a string, whereas GPE
is a mapping/dictionary.
If you are merging domain files and there is an entity like GPE
it will fail.
@joejuzl I wanted to ask for your opinion on two approaches I thought for unifying the 2 merge methods (merge
and merge_domain_dicts
):
I explained in this PR description why it's currently hard to unify these methods and why in the current PR they're only sharing the core functionality.
A different way is to modify the from_path
, from_file
... from_yaml
to return a domain dict, rather than a Domain
instance. Then I could modify load
method this way:
@classmethod
def load(cls, paths: Union[List[Union[Path, Text]], Text, Path]) -> "Domain":
if not paths:
raise InvalidDomain(
"No domain file was specified. Please specify a path "
"to a valid domain file."
)
elif not isinstance(paths, list) and not isinstance(paths, set):
paths = [paths]
domain = Domain.empty().as_dict()
for path in paths:
other_domain_dict = cls.from_path(path)
domain = cls.merge_domain_dicts(domain, other_domain_dict)
return cls.from_dict(domain)
This however would require a large amount of tests that load domain paths to be changed.
I also explored keeping a self.data
attribute storing the original domain dict in the constructor, then modifying the load
method as such:
@classmethod
def load(cls, paths: Union[List[Union[Path, Text]], Text, Path]) -> "Domain":
...
domain_dict = Domain.empty().as_dict()
for path in paths:
other = cls.from_path(path)
other_dict = other.as_dict()
other_dict.update(other.data)
domain_dict = cls.merge_domain_dicts(domain_dict, other_dict)
return cls.from_dict(domain_dict)
This might require amendments in tests/codebase where Domain
constructor is used directly to add the new positional argument data
.
Which do you think it's a better option?
I also explored keeping a self.data attribute storing the original domain dict in the constructor, then modifying the load method as such
I think this is a good approach. I also think this could help make other things simpler. A lot of the bugs and complexities from merging come from the fact that we have to get back to the original representation to merge e.g. all the annoyance with use_entities
-> used_entities
etc. We also have to deal with this when we persist the domain, i.e. take it back to a dict/yaml form (see stuff like Domain.cleaned_domain()
and .as_dict()
).
If instead we keep a self.data
, like you say, then we can always access this as the source of truth. And the rest of the domain is really just a window onto this original representation.
However I haven't looked deeply into the implications of this - so it may be harder than it sounds. Definitely worth exploring in my opinion though!
Background
A few bugs were identified around the merging of multiple domain files. While fixing these bugs relatively quickly we accrued some technical debt. After further investigation into how domain loading and merging works some inconsistencies and bugs were found. This ticket encompasses the tech debt and bug fixes, but not the behaviour changes. As the tech debt and bug are very related, it makes sense to do them in one go. More details: https://www.notion.so/rasa/Improve-domain-loading-b3998eafc5f7406dab0be88613dfd15d
Overview
Bugs
Tech debt
Domain.merge_domain_dicts
andDoman.merge
into one method. (Or at least share the core functionality)merge_domain_dicts
on a class. This should be fixed.merge_dicts
should not be public. We are restricted by and have to support the public interfaces of classes likeDomain
.self.duplicates
on theDomain
instance. Can we just warn as we merge/load and then raiseInvalidDomain
without storing this data as a public instance variable?domain.duplicates
is checked later byValidator.verify_domain_duplicates
. Maybe we can remove this and just do it in theDomain
classDomain._check_domain_sanity
appears to duplicate some of the duplication checks - can we remove/combine it?_transform_intent_properties_for_internal_use
. This is difficult to revert. Instead can we always keep the original domain dict on the instance, and use that for both domain persistence and merging?Definition of Done