facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 723 forks source link

Time/ES next <weekday> returns different in English than in Spanish #623

Closed clobotorre closed 3 years ago

clobotorre commented 3 years ago

I would like to check again this issue, because I think it would need a last response: #620

In English "next weekday" returns one single value in the values array, while in Spanish "proximo weekday" returns three values. I would like to know if it's possible for you to change the Spanish behaviour to match the English one.

One example request in English would be: curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text="next friday"' | jq

Response:

[
  {
    "body": "next friday",
    "start": 1,
    "value": {
      "values": [
        {
          "value": "2021-06-04T00:00:00.000-07:00",
          "grain": "day",
          "type": "value"
        }
      ],
      "value": "2021-06-04T00:00:00.000-07:00",
      "grain": "day",
      "type": "value"
    },
    "end": 12,
    "dim": "time",
    "latent": false
  }
]

The equivalent request in Spanish:

curl -XPOST http://0.0.0.0:8000/parse --data 'locale=es_ES&text="El próximo viernes"' | jq

Response:

[
    {
        "body": "viernes",
        "dim": "time",
        "end": 18,
        "latent": false,
        "start": 11,
        "value": {
            "grain": "day",
            "type": "value",
            "value": "2021-05-28T00:00:00.000+02:00",
            "values": [
                {
                    "grain": "day",
                    "type": "value",
                    "value": "2021-05-28T00:00:00.000+02:00"
                },
                {
                    "grain": "day",
                    "type": "value",
                    "value": "2021-06-04T00:00:00.000+02:00"
                },
                {
                    "grain": "day",
                    "type": "value",
                    "value": "2021-06-11T00:00:00.000+02:00"
                }
            ]
        }
    }
]
chessai commented 3 years ago

For some background, the values array contains what duckling thinks may be the next most likely meanings of the text. This is entirely informed by the corpora training data (and the resulting classifiers), so no change to the code will really remedy this, but rather the training data. Furthermore it is difficult to ascertain exactly which set of changes would result in the behaviour matching.

Secondly, I'm curious why you care about the values array? For most users, it should be irrelevant, and they should just care about value.value. I am not trying to dismiss whatever use case you have in mind, just trying to understand why you find it useful to look in the values array.

chessai commented 3 years ago

Also, I just noticed, if you look at what gets matched, in English it's "next friday", but in Spanish it's just "viernes". This means that duckling is not matching "próximo", likely due to lacking a rule. This could very well explain the discrepancy.