BeWe11 / rasa_composite_entities

A Rasa NLU component for composite entities.
MIT License
29 stars 10 forks source link

Support for Multi-level Hierarchy #17

Closed shaswat-indian closed 3 years ago

shaswat-indian commented 4 years ago

Currently the CompositeEntityExtractor can be used multiple times in the rasa configuration pipeline but only the patterns in the last instance are saved because the metadata for all others get overwritten in the default composite_entities.json file. We can overcome this by taking in composite_entities_file from the configuration(.yml) as the file name to store the metadata for each instance of the CompositeEntityExtractorcomponent. Alternatively, we can generate a random file name each time the component is initialised.

BeWe11 commented 4 years ago

Hey @shaswat-indian

Why do you want to use this component multiple times in a single pipeline? I'm not sure that I understand your usecase.

BeWe11 commented 4 years ago

After looking at your commit, I think I now understand what you are trying to achieve. You want to include composite entities in patterns of other composite entities.

Let's say you have a composite pattern C1 = @A + @B and another composite pattern C2 = @C1 + @B. If you have an utterance A + B + B, a first pass of the component would yield C1 + B and then a second pass would yield C2. Is that right?

To be honest, I don't like the fact that you are using multiple instances of the component to achieve this. Instead, this could be implement by just reapplying the component logic as long as something has changed, i.e. a pattern has matched. You could probably get away with a simple while True loop that breaks after no change has been detected. Benefits of this approach would be:

  1. No need for multiple instances of the component,
  2. Supports arbitrary deep "hierarchies".

Would this be sufficient to solve your problem, or am I missing something that would require to actually use multiple instances of the component?

shaswat-indian commented 4 years ago

The solution you provided seems to work for simple use cases like you have provided. Consider a use case as:-

{
 "composite_entities": [
   {
     "name": "C1",
     "patterns": [
       "@A @B"
     ]
   },
  {
     "name": "C2",
     "patterns": [
       "(@A)? @B"
     ]
   }
 ]
}

In this case, for some input text having pattern "@A @B", we get both C1 as "@A @B" as well as C2 as " @B", though I don't expect " @B" to be detected as a part of C2 as it already is a part of C1 which as per my preference should be higher priority.

This can be resolved if we have two separate files with the first instance of CompositeEntityExtractor having the first file as input and the second with the latter one.

{
 "composite_entities": [
   {
     "name": "C1",
     "patterns": [
       "@A @B"
     ]
   }
 ]
}
{
 "composite_entities": [
   {
    "name": "C2",
     "patterns": [
       "(@A)? @B"
     ]
   }
 ]
}

This may seem to be a very trivial example which could be resolved with some clever regexes but as the number of entities in the patterns increase, the complexity grows very much.

shaswat-indian commented 4 years ago

Coming to the multilevel entity hierarchy thing, if we can have multiple instances of the CompositeEntityExtractor, we could support hierarchical entity structure in Rasa, which is something provided in other NLP platforms like Dialogflow.

BeWe11 commented 3 years ago

I'm gonna close this for now, as there was no new activity and I'm not sure whether this is still relevant for anyone.