facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 723 forks source link

Time/EN Written-out times with dashes (e.g. five-thirty) should be respected #604

Closed stroxler closed 3 years ago

stroxler commented 3 years ago

It's considered a pretty standard convention to write times with a dash (for example see this blog on grammar, but our current parser only accepts a space, and considers that to be latent.

Here's "five thirty":

> debugCustom testContext Options{withLatent=True}  "five thirty" [Seal Time]
<hour-of-day> <integer> (five thirty)
-- time-of-day (latent) (five)
-- -- integer (0..19) (five)
-- -- -- regex (five)
-- integer (20..90) (thirty)
-- -- regex (thirty)
[
    {
        "body": "five thirty",
        "dim": "time",
        "end": 11,
        "latent": true,
        "start": 0,
        "value": {
            "grain": "minute",
            "type": "value",
            "value": "2013-02-12T05:30:00.000-02:00",
            "values": [
                {
                    "grain": "minute",
                    "type": "value",
                    "value": "2013-02-12T05:30:00.000-02:00"
                },
                {
                    "grain": "minute",
                    "type": "value",
                    "value": "2013-02-12T17:30:00.000-02:00"
                },
                {
                    "grain": "minute",
                    "type": "value",
                    "value": "2013-02-13T05:30:00.000-02:00"
                }
            ]
        }
    }
]

Here's what we get currently for "five-thirty":

> debugCustom testContext Options{withLatent=True}  "five-thirty" [Seal Time]
time-of-day (latent) (five)
-- integer (0..19) (five)
-- -- regex (five)
year (latent) (-thirty)
-- negative numbers (-thirty)
-- -- regex (-)
-- -- integer (20..90) (thirty)
-- -- -- regex (thirty)
[
    {
        "body": "five",
        "dim": "time",
        "end": 4,
        "latent": true,
        "start": 0,
        "value": {
            "grain": "hour",
            "type": "value",
            "value": "2013-02-12T05:00:00.000-02:00",
            "values": [
                {
                    "grain": "hour",
                    "type": "value",
                    "value": "2013-02-12T05:00:00.000-02:00"
                },
                {
                    "grain": "hour",
                    "type": "value",
                    "value": "2013-02-12T17:00:00.000-02:00"
                },
                {
                    "grain": "hour",
                    "type": "value",
                    "value": "2013-02-13T05:00:00.000-02:00"
                }
            ]
        }
    },
    {
        "body": "-thirty",
        "dim": "time",
        "end": 11,
        "latent": true,
        "start": 4,
        "value": {
            "grain": "year",
            "type": "value",
            "value": "1970-01-01T00:00:00.000-02:00",
            "values": [
                {
                    "grain": "year",
                    "type": "value",
                    "value": "1970-01-01T00:00:00.000-02:00"
                }
            ]
        }
    }
]
chessai commented 3 years ago

Duckling used to accept dashes as general separators, but this caused some issues. You can see how this was undone by @patapizza with git diff 9c367ab6cd9afe993d817fed8372496f5c49119a d5555d01495621cc8acaeee71cd489e471e9fcd7 (9c367ab6cd9afe993d817fed8372496f5c49119a was the commit that undid this).

EDIT: that being said, we should try to support this with some rule(s)

chessai commented 3 years ago

resolved by bf696ba185cfc77aabc514380d92f586bb335eb3