facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 723 forks source link

Time/ES non ascii chars - empty response #618

Closed clobotorre closed 3 years ago

clobotorre commented 3 years ago

Spanish special chars (accents and 'ñ') causes empty response

curl -XPOST http://0.0.0.0:8000/parse --data 'locale=es_ES&text="el próximo año"'

I don`t know if it is a curl / stack / duckling issue.

chessai commented 3 years ago

Hmm, seems fine from duckling:

> debug (makeLocale ES Nothing) "el próximo año" [Seal Time]
el proximo <cycle>  (el próximo año)
-- regex (el )
-- regex (próximo)
-- año (grain) (año)
-- -- regex (año)
[
    {
        "body": "el próximo año",
        "dim": "time",
        "end": 14,
        "latent": false,
        "start": 0,
        "value": {
            "grain": "year",
            "type": "value",
            "value": "2014-01-01T00:00:00.000-02:00",
            "values": [
                {
                    "grain": "year",
                    "type": "value",
                    "value": "2014-01-01T00:00:00.000-02:00"
                }
            ]
        }
    }
]

Perhaps this is a LANG issue? How are you running duckling? Docker, Stack, Cabal?

Can you echo $LANG?

clobotorre commented 3 years ago

I am running duckling this way:

stack exec duckling-example-exe no port specified, defaulting to port 8000 Listening on http://0.0.0.0:8000

The command

echo $LANG

returns empty

If I try a weekday instead of a year, I get different results depending on if the weekday name (in Spanish) contains accent or not. For example, for monday (lunes), I get results:

curl -XPOST http://0.0.0.0:8000/parse --data 'locale=es_ES&text=el próximo lunes' [{"body":"lunes","start":11,"value":{"values":[{"value":"2021-05-24T00:00:00.000-07:00","grain":"day","type":"value"},{"value":"2021-05-31T00:00:00.000-07:00","grain":"day","type":"value"},{"value":"2021-06-07T00:00:00.000-07:00","grain":"day","type":"value"}],"value":"2021-05-24T00:00:00.000-07:00","grain":"day","type":"value"},"end":16,"dim":"time","latent":false}]`

But for saturday (sábado) I get empty results:

curl -XPOST http://0.0.0.0:8000/parse --data 'locale=es_ES&text=el próximo sábado' []

It seems like two non ascii chars in the request breaks something

chessai commented 3 years ago

From my local server:

❯ echo $LANG
en_US.UTF-8

❯ curl -XPOST http://0.0.0.0:5406/parse --data 'locale=es_ES&text="el próximo año"' | jq
[
  {
    "body": "el próximo año",
    "start": 1,
    "value": {
      "values": [
        {
          "value": "2022-01-01T00:00:00.000-08:00",
          "grain": "year",
          "type": "value"
        }
      ],
      "value": "2022-01-01T00:00:00.000-08:00",
      "grain": "year",
      "type": "value"
    },
    "end": 15,
    "dim": "time",
    "latent": false
  }
]
clobotorre commented 3 years ago

Solved setting LANG environment variable before launching duckling. Thanks