KayLerch / alexa-utterance-generator

Use this tool if you'd like to generate hundreds and thousands of variant utterances for your Alexa skills.
Apache License 2.0
83 stars 11 forks source link

Low performance while generating millions of utterances #1

Closed rekire closed 6 years ago

rekire commented 6 years ago

I noticed that the performance get very low when you generate a lot of permutations. The issue is that you validate every new sentence with every generated so far.

The performance issue is located in UtteranceGenerator.store(). For my usage I removed the check and added a post processing check.

Please consider to move the check out of the storage.

KayLerch commented 6 years ago

Sounds reasonable :) I did never stress this tool like you did. I agree this needs an improvement even though it is not very relevant for the use case of Alexa skills. Interaction model size is limited and won´t store millions of utterances. The most I`ve seen so far was about 15k utterances.

KayLerch commented 6 years ago

Hi Rekire,

with the new version 2 of the generator I paid more attention to runtime performance. Even though I followed your suggestion and now do validation and double detection at the very end of the process it still takes ages to process millions of utterances. I am sorry. I tried to improve the detection logic but seems like there is still room for improvement. If you want to give it a try, here it goes: https://github.com/KayLerch/alexa-utterance-generator/blob/master/src/main/java/io/klerch/alexa/utterances/model/InteractionModel.java#L110

What you can do now is to disable validation in the generator before calling generate(). This is the only booster I can offer right now ;)

KayLerch commented 6 years ago

Closing this for now. Suggestion given above. Anyone feel free to improve the algorithm referenced above as well.