facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 720 forks source link

[EN/DE/ES/...] 'Part of Day' rule coerces timespans #648

Open emlautarom1 opened 3 years ago

emlautarom1 commented 3 years ago

Some time spans are incorrectly interpreted, yielding values that seem caused by a defaulting rule. Parsing is correct, but when building the final Entitys for some reason the time span from and to are incorrect.

All of these report the correct time spans:

vormittags von 3 bis 11 Uhr
vormittags von 4 bis 11 Uhr
...
vormittags von 10 bis 11 Uhr

If the lower bound is 1 or 2 the lower bound is (incorrectly) set to 03:00.

On the other side, if the upper bound is 12 or greater then the time span defaults to 03:00 to 12:00 consistently. I tracked down the issue to the rule time-of-day (latent):

https://github.com/facebook/duckling/blob/d8888e2ff842c4dacd570989570c101cf7b43153/Duckling/Time/DE/Rules.hs#L959

As far as I understand, 'latent' means that there's some ambiguity. For example, the number 7 could mean several things depending on the context.

Lets change this rule and use the check n < 13 like in the EN version:

https://github.com/facebook/duckling/blob/d8888e2ff842c4dacd570989570c101cf7b43153/Duckling/Time/EN/Rules.hs#L577

This will no longer break the lower bound when the upper bound is 12. Nevertheless, the issue remains for the upper bound: values 12 ... as upper bound result in ranges 03:00 to 12:00 consistently. Do these values ring any bell? some kind of defaulting rule?

emlautarom1 commented 3 years ago

The morning rule may be the cause of said values (03:00 and 12:00):

https://github.com/facebook/duckling/blob/d8888e2ff842c4dacd570989570c101cf7b43153/Duckling/Time/DE/Rules.hs#L1339-L1350

Shouldn't be 13 the upper bound, since it's non-inclusive?

It seems that the morning rule has precedence over the time-of-day (latent) rule, the latter correctly detecting the hours in the input.

How can we alter the precedence of said rules? Explicit hours should not be coerced into a range given some modifier like morning.

emlautarom1 commented 3 years ago

Update: This issue is not exclusive to German, it also affects languages like English. For example, the input "morning from 9 to 12" returns the time span 00:00 to 01:00. The root cause seems to be the rule partOfDays which is similar to the German morning rule:

https://github.com/facebook/duckling/blob/72f45e8e2c7385f41f2f8b1f063e7b5daa6dca94/Duckling/Time/EN/Rules.hs#L1001-L1019

The lower bound seems to be coerced into the start of 'morning' (00:00). I can't seem to find what sets the upper bound to 01:00 though.

Any other input where the upper bound is greater than 12 results in the time span 00:00 to 12:00, which are the bounds for "morning"

emlautarom1 commented 3 years ago

Update: The same issue happens with Spanish. For example, the input "a la mañana de 9 a 13" returns the time span 04:00 to 12:00 which, again, are the bounds for "morning":

https://github.com/facebook/duckling/blob/72f45e8e2c7385f41f2f8b1f063e7b5daa6dca94/Duckling/Time/ES/Rules.hs#L939-L950

Inputs with 12 result in 12:00 as the upper bound (it should be 13:00). Values higher than 12 as upper bound result in time spans 04:00 to 12:00 consistently.

emlautarom1 commented 3 years ago

The same solution should apply to those 3 languages (I'm sure there are more affected by this bug): the "morning" / "part of day" rule should not have priority over literal values.

chessai commented 3 years ago

Prettified debug output:

> debug (makeLocale EN Nothing) "morning between 9 and 12" [Seal Time]
intersect (morning between 9 and 12)
-- part of days (morning)
-- -- regex (morning)
-- between <time-of-day> and <time-of-day> (interval) (between 9 and 12)
-- -- regex (between)
-- -- time-of-day (latent) (9)
-- -- -- integer (numeric) (9)
-- -- -- -- regex (9)
-- -- regex (and)
-- -- time-of-day (latent) (12)
-- -- -- integer (numeric) (12)
-- -- -- -- regex (12)
[
    {
        "body": "morning between 9 and 12",
        "dim": "time",
        "end": 24,
        "latent": false,
        "start": 0,
        "value": {
            "from": {
                "grain": "hour",
                "value": "2013-02-12T00:00:00.000-02:00"
            },
            "to": {
                "grain": "hour",
                "value": "2013-02-12T01:00:00.000-02:00"
            },
            "type": "interval",
            "values": [
                {
                    "from": {
                        "grain": "hour",
                        "value": "2013-02-12T00:00:00.000-02:00"
                    },
                    "to": {
                        "grain": "hour",
                        "value": "2013-02-12T01:00:00.000-02:00"
                    },
                    "type": "interval"
                },
                {
                    "from": {
                        "grain": "hour",
                        "value": "2013-02-12T09:00:00.000-02:00"
                    },
                    "to": {
                        "grain": "hour",
                        "value": "2013-02-12T12:00:00.000-02:00"
                    },
                    "type": "interval"
                },
                {
                    "from": {
                        "grain": "hour",
                        "value": "2013-02-13T00:00:00.000-02:00"
                    },
                    "to": {
                        "grain": "hour",
                        "value": "2013-02-13T01:00:00.000-02:00"
                    },
                    "type": "interval"
                }
            ]
        }
    }
]
chessai commented 3 years ago

Interestingly, the correct answer appears in the possible answers in the response.

emlautarom1 commented 3 years ago

@chessai I don't think that's the correct answer considering that the upper bound should be exclusive: we should get 09:00 - 13:00. I think that the 12:00 that we get as upper bound is just the upper bound of morning, that is 00:00 - 12:00.