SimGus / Chatette

A powerful dataset generator for Rasa NLU, inspired by Chatito
MIT License
319 stars 56 forks source link

Convert Rasa NLU training data to Chatette format #40

Closed visvamba closed 4 years ago

visvamba commented 4 years ago

Is it possible to convert my data/nlu.md or nlu.json into a Chatette file? The base file option only extracts the regex and lookup from Rasa files, from what I can tell.

SimGus commented 4 years ago

Hey @visvamba! Thanks for your question.

Chatette is meant to generate training data (so the nlu.md or nlu.json files in Rasa) from templates, not the other way around. Generally, you'll have to write either your templates on your own (or use someone else's), or write the training data yourself directly.

There would actually be a way to automatically turn a training data file into a template file, but without extracting the structure behind the examples, which wouldn't make much sense. For example, given the following nlu.md:

## intent:ask-food
- I want [food](food)
- I want a little bit of [fish](food)
- I would like [fish](food)
- I would like some [food](food)

you could automatically create this template file:

%[ask-food](4)
  I want @[food]
  I want a little bit of @[food]
  I would like @[food]
  I would like some @[food]

@[food]
  fish
  food

which I assume is not what you want. Given the same training data, the following template would make much more sense, but is not easy to automatically generate -- if possible at all:

%[ask-food](4)
  I want ~[some] @[food]
  I would like ~[some] @[food]

~[some]
  some
  a little bit of

@[food]
  food
  fish

This is obviously a very simple example, but I guess you see the point: making a template which contains each and every example that's in your training data doesn't make much sense. If needed though, it shouldn't be too hard to make a small bash script that turns your nlu.md file into a "dumb" template file as I just showed you, if you really need this.

If you're interested, this problem of extracting structure (i.e. templates) from a list of example is actually still an active research subject.

All that being said, could you please tell me why you need to do the reverse process that Chatette does?

Cheers.

visvamba commented 4 years ago

Thanks for your explanation. I can see why the reverse process would be better done by hand. I asked because I have a moderately-sized training data set already written for my Rasa model in Markdown, and I think the Chatette format is a lot cleaner and easier to organise.