Closed tabergma closed 1 year ago
Exalate commented:
edubrigham commented:
Hi this exactly the issue I'm trying to solve right now. Can i use CRFEntityExtractor to extract "related" numbers based on "entity groups" by having training data as follows?
can I have [2] {"entity": "itemQty", "group": "1"}
[beer]
{"entity": "itemName", "group": "1"}
[1]
{"entity": "itemQty", "group": "2"}
[water]
{"entity": "itemName", "group": "2"}
and [4]
{"entity": "itemQty", "group": "3"}
[mojito]
{"entity": "itemName", "group": "3"}
And what happens if the user uses cardinals instead of numbers? usually this is done by DucklingHTTPExtractor, which does not read "groups" or "roles"
Exalate commented:
pawan-datascience commented:
Any updates on this? Makes sense to annotate tokens in the training data with roles:
`The car has [4]
{"role": "wheel-count"}
wheels and [2]
{"role": "door-count"}
doors`
Exalate commented:
JEM-Mosig commented:
It should be easier to add this support after resolving this issue
➤ Maxime Verger commented:
:bulb: Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.
From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!
:arrow_right: More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.
Description of Problem: Entity roles and groups are currently only support by
CRFEntityExtractor
andDIETClassifier
, e.g. components that extract custom entities. It might be helpful to add support forSpacyEntityExtractor
andDucklingHTTPExtractor
as well. If you are building a larger bot, you might have lots of different entities that are "number". Example:I need a table for 4 with one seat for a toddler
Overview of the Solution: We don't have a clear solution in mind. One idea was to split off the role classification part from e.g. the
CRFEntityExtractor
and then wrapping that around the duckling extractor.However, the more challenging part is the annotation of the training data. For
SpacyEntityExtractor
andDucklingHTTPExtractor
you don't need to annotate anything right now. But if you want to learn specific roles for e.g. a "number", we need some kind of training data. One idea was to just annotate tokens in the training data with roles. For example:The car has [4]
{"role": "wheel-count"}
wheels and [2]
{"role": "door-count"}
doors