Closed ttlekich closed 4 years ago
Hey @ttlekich ,
thank you for mentioning this behavior, I'd consider it a bug. I've fixed it in e5c3109982e4dde3cad7b064806f1afd041d3cbe by sorting the entities by their start
value before processing them.
Version 1.0.2
this the package contains the fix, just run pip install -U rasa-composite-entities
Awesome, thank you!
From a brief glance, it seems as though the composite entity extractor relies on the entity list to sorted by appearance in the text. Most of the time, the entities are in the order they appear in the text; however, Duckling seems to mess with this order.
Examples (what currently happens with rasa/duckling):
Say I have a composite entity pattern:
wordA number wordB
wherewordA
andwordB
are entities andnumber
is a duckling-parsible number. If the stringwordA 30 wordB
is parsed by rasa with duckling in the training pipeline, the composite entity does not get caught. Only the primitive entitieswordA
,wordB
, and30
are caught.If I had the composite entity pattern:
wordA wordB number
(same constraints as above), and the stringwordA wordB 30
was parsed by rasa/duckling, then the composite entity would get caught.Duckling seems to put its parsed entities at the end of the entity array. Locally, I just sorted the entity array by
start
to get them back in order (in_find_composite_entities
) - this fixes the issue above for me. I am not sure if this is the best way of fixing this issue or maybe I missed the underlying issue, but I'd be happy to make a PR if any changes are needed.Thank you!