SimGus / Chatette

A powerful dataset generator for Rasa NLU, inspired by Chatito
MIT License
320 stars 56 forks source link

Incorrect entity position in rasa adapter #22

Closed hungph-dev-ict closed 5 years ago

hungph-dev-ict commented 5 years ago

{ "entities": [ { "end": 112, "entity": "bot_job", "start": 93, "value": "sứ mệnh như thế nào" } ], "intent": "ask_for_bot_job", "text": "xếp sứ mệnh như thế nào" },

The 1.6.0 version and I are having the error of generating the wrong locations of entities in the sentence, above is an example. Please check again! Thank you

SimGus commented 5 years ago

I cannot reproduce this bug, I get the following output:

{
    "entities": [
      {
        "end": 23, 
        "entity": "slot", 
        "start": 4, 
        "value": "sứ mệnh như thế nào"
      }
    ], 
    "intent": "intent", 
    "text": "xếp sứ mệnh như thế nào"
}

Are you using a different file encoding than UTF-8? (this is likely the case if you're using Windows.) If your file has a different file encoding, try changing it to UTF-8. Chatette should always work well with Unicode files.

If this is not an encoding problem, could you provide the template you used to get this output please?

devdeca commented 5 years ago

I'm having the same problem, the file encode is us-ascii. It is also interesting notice that if I configure the training down to 1 or 2, the generation works out fine, but as I raised its value, the result became like showed below.

My template:

%[askPopulation]('training': '5')
    ~[greet?] ~[population] @[city]

@[city]
    new york
    tokyo

~[greet]
    hi
    hello

~[population]
    what is the population of
    how many people are there in

The output:

{
  "rasa_nlu_data": {
    "common_examples": [
      {
        "entities": [
          {
            "end": 183,
            "entity": "city",
            "start": 178,
            "value": "tokyo"
          }
        ],
        "intent": "askPopulation",
        "text": "how many people are there in tokyo"
      },
      {
        "entities": [
          {
            "end": 186,
            "entity": "city",
            "start": 178,
            "value": "new york"
          }
        ],
        "intent": "askPopulation",
        "text": "hello what is the population of new york"
      },
      {
        "entities": [
          {
            "end": 183,
            "entity": "city",
            "start": 178,
            "value": "tokyo"
          }
        ],
        "intent": "askPopulation",
        "text": "what is the population of tokyo"
      },
      {
        "entities": [
          {
            "end": 186,
            "entity": "city",
            "start": 178,
            "value": "new york"
          }
        ],
        "intent": "askPopulation",
        "text": "how many people are there in new york"
      },
      {
        "entities": [
          {
            "end": 186,
            "entity": "city",
            "start": 178,
            "value": "new york"
          }
        ],
        "intent": "askPopulation",
        "text": "what is the population of new york"
      }
    ],
    "entity_synonyms": [],
    "lookup_tables": [],
    "regex_features": []
  }
}
SimGus commented 5 years ago

There is a problem indeed. It doesn't seem related to encoding. I'll fix that as soon as I can.

Thanks for the heads up :)

SimGus commented 5 years ago

This is fixed on master branch. The fix will be present in the next release.

Cheers

SimGus commented 5 years ago

The newest version of Chatette (v1.6.1) has just been released with this bug fixed!