Closed Asma-droid closed 4 years ago
Hi,
What you report seems related to issue #22 which was fixed in v1.6.1. Using the template you provided, I cannot reproduce the results you get.
Would you mind giving a little more information about the environment you run chatette in? Namely, I'd need to know:
python -m chatette --version
.Thanks in advance :)
Thanks a lot. It works fine!! I have just one other problème. I use the french language however the actual code does not support UTF8 encoding. Any help please!!!
You're welcome!
For Unicode encoding, I guess you are running Windows, which is likely to encode your files using Windows-1252 encoding. Try using a file editor that allows you to save your template files in UTF-8 and feeding them to Chatette should work without a problem. Using a recent version of python (>= 3.5 I would say) could also help.
Just so you know, I speak French and I have no problem running Chatette to produce French datasets by running Linux (whatever the Python version) and encoding all my files in UTF-8.
Hope this helps! Feel free to ask if you need help again!
Thanks a lot. It works fine for me.
Just another question. How to avoid redendency. In fact, if i put 100 as the augmentation parameter, the algorithm generate redandant sentences.
What exactly do you mean by "redundancy"? The generated data should never contain duplicates, so if you get the same generated sentence twice (or more times), this is a bug.
If you mean your sentences are too close to each other, this simply depends on your templates. A good way to have variation in the generated sentences is to have a lot of rules in your aliases, slots and templates. You can take a look at different examples on the repo if you want to see how to make good templates.
Cheers!
Hello @SimGus
I am so sory for my late response.
I have tried to use the simple example of the directory toilets and i have set the number of generated sentences to 1000 instead of 3. The algorithme generate duplicated. Have you a solution to avoid that ??
Hey @Asma-droid,
The two sentences are actually not duplicates: one of them starts with an uppercase letter, while the other one starts with a lowercase letter.
If you want to avoid generating sentences with different cases, simply remove the ampersand &
that are at the beginning of the unit declaration (so for the toilet example, remove the ampersand in %[&ask_toilet](100)
).
I hope this helps.
Thank you for this response !
You're welcome! I will close this issue as it seems fixed.
Hi all,
Thanks to this great library :-)
I would like to use several slots within a same sentence but the produced json file does not pick up the right start and end of the slots.
Below a simple example:
**** txt_file *** %&ask_toilet where the @[toilet#singular] is @[please]?
@[toilet#singular] toilet loo @[please] please plz
* json result ***
{ "rasa_nlu_data": { "common_examples": [ { "entities": [ { "end": 13, "entity": "toilet", "start": 10, "value": "loo" }, { "end": 42, "entity": "please", "start": 36, "value": "please" } ], "intent": "ask_toilet", "text": "Where the loo is please?" }, { "entities": [ { "end": 13, "entity": "toilet", "start": 10, "value": "loo" }, { "end": 39, "entity": "please", "start": 36, "value": "plz" } ], "intent": "ask_toilet", "text": "where the loo is plz?" }, { "entities": [ { "end": 16, "entity": "toilet", "start": 10, "value": "toilet" }, { "end": 39, "entity": "please", "start": 36, "value": "plz" } ], "intent": "ask_toilet", "text": "where the toilet is plz?" } ], "entity_synonyms": [], "lookup_tables": [], "regex_features": [] } }
Any idea please ???