Closed kjwill555 closed 4 years ago
i've been trying to figure out where the hard limit of 7 seconds comes from, demo_cli tries to stretch or jam everything into 7 seconds, and if you overload it, it just says random syllables with echo.
There's a section in the spectrum program that allows you to set the sizes and stuff but if you change it some library complains that the variables have the wrong values. I'm like 70% sure that all of the limitations of demo_cli.py are in the spectrum parameters and the libraries it uses to create the spectrum
Closing this as duplicate of #53. Let's work the issue there.
It would be awesome to be able to use this to help train a hot word detector. In addition to recording myself saying the hotword, I could create an even larger dataset by adding outputs of this model that used my voice as the reference.
The problem with that, however, is that this model seems to only work well on sentences of medium length (+- 20 words according to demo_cli.py). Is there anything I can do to make short text samples (e.g. 2 words) sound better?