facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 720 forks source link

Why are consecutive dates being intersected? #663

Open PrajnyaSatish opened 2 years ago

PrajnyaSatish commented 2 years ago

When I have dates written in consecutive order for example "March 4, March 16", the first span is not being parsed correctly. This is what is returned -

debug (makeLocale EN $ Just US) "March 4, March 23" [This Time]
intersect by ",", "of", "from", "'s" (March 4, March)
> -- <named-month> <day-of-month> (March 4)
> -- -- March (March)
> -- -- -- regex (March)
> -- -- integer (numeric) (4)
> -- -- -- regex (4)
> -- regex (,)
> -- March (March)
> -- -- regex (March)
> <named-month> <day-of-month> (March 23)
> -- March (March)
> -- -- regex (March)
> -- integer (numeric) (23)
> -- -- regex (23)
> [Entity {dim = "time", body = "March 4, March", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-03-04 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-03-04 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2014-03-04 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2015-03-04 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 14, latent = False, enode = Node {nodeRange = Range 0 14, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 7, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"},Node {nodeRange = Range 6 7, token = Token Numeral (NumeralData {value = 4.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 6 7, token = Token RegexMatch (GroupMatch ["4"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month>"},Node {nodeRange = Range 7 8, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing},Node {nodeRange = Range 9 14, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 14, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"}], rule = Just "intersect by \",\", \"of\", \"from\", \"'s\""}},Entity {dim = "time", body = "March 23", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-03-23 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-03-23 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2014-03-23 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2015-03-23 00:00:00 -0200, vGrain = Day})] Nothing), start = 9, end = 17, latent = False, enode = Node {nodeRange = Range 9 17, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 14, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 14, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"},Node {nodeRange = Range 15 17, token = Token Numeral (NumeralData {value = 23.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 15 17, token = Token RegexMatch (GroupMatch ["23"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month>"}}]

the second month (and not the DOM) is being included in the previous span. But this is not the case when I have different months in succession -

*Duckling.Debug> debug (makeLocale EN $ Just US) "March 4, may 23" [This Time]
<named-month> <day-of-month> (March 4)
-- March (March)
-- -- regex (March)
-- integer (numeric) (4)
-- -- regex (4)
<named-month> <day-of-month> (may 23)
-- May (may)
-- -- regex (may)
-- integer (numeric) (23)
-- -- regex (23)
[Entity {dim = "time", body = "March 4", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-03-04 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-03-04 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2014-03-04 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2015-03-04 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 7, latent = False, enode = Node {nodeRange = Range 0 7, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"},Node {nodeRange = Range 6 7, token = Token Numeral (NumeralData {value = 4.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 6 7, token = Token RegexMatch (GroupMatch ["4"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month>"}},Entity {dim = "time", body = "may 23", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-05-23 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-05-23 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2014-05-23 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2015-05-23 00:00:00 -0200, vGrain = Day})] Nothing), start = 9, end = 15, latent = False, enode = Node {nodeRange = Range 9 15, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 12, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 5}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 12, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "May"},Node {nodeRange = Range 13 15, token = Token Numeral (NumeralData {value = 23.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 13 15, token = Token RegexMatch (GroupMatch ["23"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month>"}}]

And it is not specific to the Month TimeGrain either. So, an utterance "Friday, Friday" is parsed as -

*Duckling.Debug> debug (makeLocale EN $ Just US) "Friday Friday" [This Time]
intersect (Friday Friday)
-- Friday (Friday)
-- -- regex (Friday)
-- Friday (Friday)
-- -- regex (Friday)
[Entity {dim = "time", body = "Friday Friday", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-15 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-02-15 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2013-02-22 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2013-03-01 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 13, latent = False, enode = Node {nodeRange = Range 0 13, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 6, token = Token Time TimeData{latent=False, grain=Day, form=Just DayOfWeek, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 6, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "Friday"},Node {nodeRange = Range 7 13, token = Token Time TimeData{latent=False, grain=Day, form=Just DayOfWeek, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 7 13, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "Friday"}], rule = Just "intersect"}}]

I tried to check for sameGrain to not intersect in that case . But for an utterance like "March 3, Friday", March 3 and Friday are both TG.Day but they need to be intersected for a correct parse. Is there any other way to fix this bug?

chessai commented 2 years ago

What version of duckling are you using? This hasn't been used for almost a year. It's Seal now.

PrajnyaSatish commented 2 years ago

duckling-0.1.6.1. But I see the same issue in the wit webpage too.

chessai commented 2 years ago

The wit webpage is using a significantly older version than even what you're using. I recommend trying out 0.2 or master and seeing if the issue exists there.

PrajnyaSatish commented 2 years ago

Ok, I've updated duckling to version - 0.2.0.0 (checkout from master). The problem still remains.

*Duckling.Debug> debug (makeLocale EN $ Just US) "March 4, March 23" [Seal Time]
intersect by ",", "of", "from", "'s" (March 4, March)
-- <named-month> <day-of-month> (non ordinal) (March 4)
-- -- March (March)
-- -- -- regex (March)
-- -- integer (numeric) (4)
-- -- -- regex (4)
-- regex (,)
-- March (March)
-- -- regex (March)
<named-month> <day-of-month> (non ordinal) (March 23)
-- March (March)
-- -- regex (March)
-- integer (numeric) (23)
-- -- regex (23)
[Entity {dim = "time", body = "March 4, March", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-03-04 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-03-04 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2014-03-04 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2015-03-04 00:00:00 -0200, vGrain = Day})] Nothing), start = 0, end = 14, latent = False, enode = Node {nodeRange = Range 0 14, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 7, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 0 5, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"},Node {nodeRange = Range 6 7, token = Token Numeral (NumeralData {value = 4.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 6 7, token = Token RegexMatch (GroupMatch ["4"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month> (non ordinal)"},Node {nodeRange = Range 7 8, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing},Node {nodeRange = Range 9 14, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 14, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"}], rule = Just "intersect by \",\", \"of\", \"from\", \"'s\""}},Entity {dim = "time", body = "March 23", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-03-23 00:00:00 -0200, vGrain = Day})) [SimpleValue (InstantValue {vValue = 2013-03-23 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2014-03-23 00:00:00 -0200, vGrain = Day}),SimpleValue (InstantValue {vValue = 2015-03-23 00:00:00 -0200, vGrain = Day})] Nothing), start = 9, end = 17, latent = False, enode = Node {nodeRange = Range 9 17, token = Token Time TimeData{latent=False, grain=Day, form=Nothing, direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 14, token = Token Time TimeData{latent=False, grain=Month, form=Just (Month {month = 3}), direction=Nothing, holiday=Nothing, hasTimezone=False}, children = [Node {nodeRange = Range 9 14, token = Token RegexMatch (GroupMatch []), children = [], rule = Nothing}], rule = Just "March"},Node {nodeRange = Range 15 17, token = Token Numeral (NumeralData {value = 23.0, grain = Nothing, multipliable = False, okForAnyTime = True}), children = [Node {nodeRange = Range 15 17, token = Token RegexMatch (GroupMatch ["23"]), children = [], rule = Nothing}], rule = Just "integer (numeric)"}], rule = Just "<named-month> <day-of-month> (non ordinal)"}}]