calpoly-csai / swanton

Swanton Pacific Ranch chatbot with a knowledge graph
MIT License
3 stars 1 forks source link

Adding text-to-speech function #9

Closed gwholland3 closed 4 years ago

gwholland3 commented 4 years ago

Summary

Added a file text_to_speech.py that makes use of the offline python library pyttsx3 to implement text to speech functionality.

Fixes #1

Details

There are two functions in the file, tts_options() and tts_default().

Both of them take in a string input, and play the equivalent audio within the function. However, while tts_default() simply generates the audio using the default settings (which are described in the function), tts_options() allows the user to select the speech rate, type of voice, and volume as well.

Testing

All of the additional arguments in tts_options() are argument checked, and all known errors raised by the library functions themselves are handled.

I tested these functions by simply feeding them different strings and making sure I got the appropriate audio output.

Issues To Consider

While I never had a problem with the audio itself, I noticed that with both library functions used to produce the audio (runAndWait() and speak()) there is a delay of about 1-2 seconds between the audio finishing playing and the function returning.

However, this is not always the case. The longer the input string is, the shorter the delay, and with a long enough string there is no delay at all.

I was not able to figure out why this occurs and am curious if others could help shed some light on it.

gwholland3 commented 4 years ago

Sure! I haven't really done that kind of thing before though, how would I do it? Are there certain tools you can use for that?

chidiewenike commented 4 years ago

@snekiam @Jason-Ku did memory benchmarking on the speech-to-text side so they can point you to a library for that. timeit is a solid solution for runtime and it is what I used to benchmark the inference time of the wake-word for the different RPis. Really simple setup. @gwholland3

gwholland3 commented 4 years ago

Ok, @chidiewenike how should I structure the benchmarking? Strings of various character lengths, word lengths, speech times...? Also how many trials should I run?

chidiewenike commented 4 years ago

@gwholland3 I will send you a doc via Slack with question/answer pairs. Just run the answer for each QA pair and get the runtime/memory metrics. Try 3 runs per string and also an average of the three runs per string. Check out the header example below. Let me know if you have any questions.

Ex Header per library (each column is delimited with a | ): String | -Runtime#1 | - Memory Usage#1 | ... | -Runtime#N | - Memory Usage#N | -Runtime Avg | -Memory Usage Avg

chidiewenike commented 4 years ago

I will open a separate issue for this and approve this PR.

chidiewenike commented 4 years ago

Also, let's try running this on a RPi this weekend.

gwholland3 commented 4 years ago

Ok, sounds good