SimGus / Chatette

A powerful dataset generator for Rasa NLU, inspired by Chatito
MIT License
317 stars 56 forks source link

ranking list, frequency counter for synonyms #21

Open NLPSoftwareDemo opened 4 years ago

NLPSoftwareDemo commented 4 years ago

Synonyms list can be rather large and we do not like to generate all possible synonym variants with same probability. Because of that we would like to have more control which variants of synonyms will be generated.

Is it possible to add ranking list or frequency counter for synonyms list?

SimGus commented 4 years ago

When you talk about synonyms, do you mean the arrays of synonyms that are generated from slot values and that Rasa uses as synonyms (see their documentation), or are you simply talking about aliases?

If you are just talking about aliases, I am indeed thinking of adding a specific syntax to modify the behavior of Chatette when it comes to choosing a rule to generate: you will then be able to specify a probability (or a frequency) for each rule in an alias, a slot or an intent.

As you can imagine, adding this will take a little time, I'll hit you up when this feature is added. If you really need this shortly, I would advice you use Chatito which already has a specific syntax for that.

NLPSoftwareDemo commented 4 years ago

Thanks for the answer.

Yes, indeed. I was talking about aliases. Happy to hear that you think about this feature. Cannot wait to use it when done!

SimGus commented 4 years ago

You're welcome :)

In the meantime, you can use the workaround of specifying several times a rule that you want to generate with a higher probability. For example, in the following example, often will be generated twice as often as rarely:

~[alias]
   often
   often
   rarely