Closed gwholland3 closed 4 years ago
Sure! I haven't really done that kind of thing before though, how would I do it? Are there certain tools you can use for that?
@snekiam @Jason-Ku did memory benchmarking on the speech-to-text side so they can point you to a library for that. timeit is a solid solution for runtime and it is what I used to benchmark the inference time of the wake-word for the different RPis. Really simple setup. @gwholland3
Ok, @chidiewenike how should I structure the benchmarking? Strings of various character lengths, word lengths, speech times...? Also how many trials should I run?
@gwholland3 I will send you a doc via Slack with question/answer pairs. Just run the answer for each QA pair and get the runtime/memory metrics. Try 3 runs per string and also an average of the three runs per string. Check out the header example below. Let me know if you have any questions.
Ex Header per library (each column is delimited with a | ):
String |
I will open a separate issue for this and approve this PR.
Also, let's try running this on a RPi this weekend.
Ok, sounds good
Summary
Added a file
text_to_speech.py
that makes use of the offline python library pyttsx3 to implement text to speech functionality.Fixes #1
Details
There are two functions in the file,
tts_options()
andtts_default()
.Both of them take in a string input, and play the equivalent audio within the function. However, while
tts_default()
simply generates the audio using the default settings (which are described in the function),tts_options()
allows the user to select the speech rate, type of voice, and volume as well.Testing
All of the additional arguments in
tts_options()
are argument checked, and all known errors raised by the library functions themselves are handled.I tested these functions by simply feeding them different strings and making sure I got the appropriate audio output.
Issues To Consider
While I never had a problem with the audio itself, I noticed that with both library functions used to produce the audio (
runAndWait()
andspeak()
) there is a delay of about 1-2 seconds between the audio finishing playing and the function returning.However, this is not always the case. The longer the input string is, the shorter the delay, and with a long enough string there is no delay at all.
I was not able to figure out why this occurs and am curious if others could help shed some light on it.