RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.91k stars 4.63k forks source link

Improve domain loading: Tech debt and associated bugs #10807

Closed joejuzl closed 2 years ago

joejuzl commented 2 years ago

Background

A few bugs were identified around the merging of multiple domain files. While fixing these bugs relatively quickly we accrued some technical debt. After further investigation into how domain loading and merging works some inconsistencies and bugs were found. This ticket encompasses the tech debt and bug fixes, but not the behaviour changes. As the tech debt and bug are very related, it makes sense to do them in one go. More details: https://www.notion.so/rasa/Improve-domain-loading-b3998eafc5f7406dab0be88613dfd15d

Overview

Bugs

Tech debt

Definition of Done

ancalita commented 2 years ago

Domain merging fails when there is a dictionary entity.

This can be fixed by using the same logic currently used for intents

@joejuzl I would need more details on this bug, do you have an example or open ticket that I can look at? Not sure I understand what dictionary entity refers to and what the logic used for intents is.

joejuzl commented 2 years ago

@ancalita

Take this example:

entities:
  - GPE:
      roles:
        - destination
        - origin
  - name

The entity name is a string, whereas GPE is a mapping/dictionary. If you are merging domain files and there is an entity like GPE it will fail.

ancalita commented 2 years ago

@joejuzl I wanted to ask for your opinion on two approaches I thought for unifying the 2 merge methods (merge and merge_domain_dicts):

  1. I explained in this PR description why it's currently hard to unify these methods and why in the current PR they're only sharing the core functionality.

  2. A different way is to modify the from_path, from_file ... from_yaml to return a domain dict, rather than a Domain instance. Then I could modify load method this way:

@classmethod
    def load(cls, paths: Union[List[Union[Path, Text]], Text, Path]) -> "Domain":
        if not paths:
            raise InvalidDomain(
                "No domain file was specified. Please specify a path "
                "to a valid domain file."
            )
        elif not isinstance(paths, list) and not isinstance(paths, set):
            paths = [paths]

        domain = Domain.empty().as_dict()
        for path in paths:
            other_domain_dict = cls.from_path(path)
            domain = cls.merge_domain_dicts(domain, other_domain_dict)

        return cls.from_dict(domain)

This however would require a large amount of tests that load domain paths to be changed.

joejuzl commented 2 years ago

I also explored keeping a self.data attribute storing the original domain dict in the constructor, then modifying the load method as such

I think this is a good approach. I also think this could help make other things simpler. A lot of the bugs and complexities from merging come from the fact that we have to get back to the original representation to merge e.g. all the annoyance with use_entities -> used_entities etc. We also have to deal with this when we persist the domain, i.e. take it back to a dict/yaml form (see stuff like Domain.cleaned_domain() and .as_dict()). If instead we keep a self.data, like you say, then we can always access this as the source of truth. And the rest of the domain is really just a window onto this original representation.

However I haven't looked deeply into the implications of this - so it may be harder than it sounds. Definitely worth exploring in my opinion though!