RestComm / Restcomm-Connect

The Open Source Cloud Communications Platform
http://www.restcomm.com/
GNU Affero General Public License v3.0
244 stars 215 forks source link

Integrate AWS Polly text-to-speech engine #1618

Closed deruelle closed 7 years ago

deruelle commented 7 years ago

See https://aws.amazon.com/polly/

rlimonta commented 7 years ago

Very nice Jean!! Will be very useful to us.

deruelle commented 7 years ago

@rlimonta would you be willing to contribute it ?

rlimonta commented 7 years ago

Yes.

deruelle commented 7 years ago

Awesome, let me know if you need anything

rlimonta commented 7 years ago

@deruelle I'm going to start studying the api of amazon polly. As soon as I have a drawing of the integration, I send for your approval.

deruelle commented 7 years ago

@rlimonta great. Were you able to make progress on that already ?

rlimonta commented 7 years ago

@deruelle I studied the AWS Polly and we will be able to implement the integration.

We will need to create new keys in the configuration file so that can configure the access keys.

AWS_ACCESS_KEY="" AWS_SECRET_KEY=""

We also have the possibility to configure the AWS Region. In this case, I would like to know your opinion.

The implementation will look like VoiceRSS, but we have a new dependency on Polly's sdk.

I created the class and component diagrams to represent the development strategy.

Class Diagram tts aws polly module

Component Diagram tts aws polly component diagram

gvagenas commented 7 years ago

@rlimonta thanks for contributing the AWS TTS plugin.

I am not sure about the diagrams you provided but the new TTS plugin should implement the TTS API - "restcomm.tts.api" similar to what the VoiceRSS plugin is doing. Regarding the configuration, you should add one more element in the restcomm.xml configuration file, similar to the one we have for VoiceRSS. Bootstrapper will make sure to pass this configuration to the implementation of the TTS plugin.

Let me know if you need any help George

rlimonta commented 7 years ago

@gvagenas I meant similar in terms of strategy. The implementation will be specific to AWS Polly API. I'll detail the diagrams to be clearer.

deruelle commented 7 years ago

@rlimonta were you able to update the diagrams yet and move forward with implementation ?

rlimonta commented 7 years ago

@gvagenas @deruelle I send the new version of the diagrams. I want to finalize a draft of the implementation by tomorrow, and I will send it to you.

tts aws polly component diagram

tts aws polly module

rlimonta commented 7 years ago

@deruelle I'm implementing the integration with AWS Polly and I have a question:

Polly works with specific Voice Id´s to pronouce languages and genders.

Not all languages support both male and female gender, and we have more than one gender option for en-US.

see https://aws.amazon.com/pt/blogs/aws/polly-text-to-speech-in-47-voices-and-24-languages/

List of Voice Id´s: {Gender: Female,Id: Joanna,LanguageCode: en-US,LanguageName: US English,Name: Joanna} {Gender: Female,Id: Mizuki,LanguageCode: ja-JP,LanguageName: Japanese,Name: Mizuki} {Gender: Female,Id: Filiz,LanguageCode: tr-TR,LanguageName: Turkish,Name: Filiz} {Gender: Female,Id: Astrid,LanguageCode: sv-SE,LanguageName: Swedish,Name: Astrid} {Gender: Male,Id: Maxim,LanguageCode: ru-RU,LanguageName: Russian,Name: Maxim} {Gender: Female,Id: Tatyana,LanguageCode: ru-RU,LanguageName: Russian,Name: Tatyana} {Gender: Female,Id: Carmen,LanguageCode: ro-RO,LanguageName: Romanian,Name: Carmen} {Gender: Female,Id: Ines,LanguageCode: pt-PT,LanguageName: Portuguese,Name: Inês} {Gender: Male,Id: Cristiano,LanguageCode: pt-PT,LanguageName: Portuguese,Name: Cristiano} {Gender: Female,Id: Vitoria,LanguageCode: pt-BR,LanguageName: Brazilian Portuguese,Name: Vitória} {Gender: Male,Id: Ricardo,LanguageCode: pt-BR,LanguageName: Brazilian Portuguese,Name: Ricardo} {Gender: Female,Id: Maja,LanguageCode: pl-PL,LanguageName: Polish,Name: Maja} {Gender: Male,Id: Jan,LanguageCode: pl-PL,LanguageName: Polish,Name: Jan} {Gender: Female,Id: Ewa,LanguageCode: pl-PL,LanguageName: Polish,Name: Ewa} {Gender: Male,Id: Ruben,LanguageCode: nl-NL,LanguageName: Dutch,Name: Ruben} {Gender: Female,Id: Lotte,LanguageCode: nl-NL,LanguageName: Dutch,Name: Lotte} {Gender: Female,Id: Liv,LanguageCode: nb-NO,LanguageName: Norwegian,Name: Liv} {Gender: Male,Id: Giorgio,LanguageCode: it-IT,LanguageName: Italian,Name: Giorgio} {Gender: Female,Id: Carla,LanguageCode: it-IT,LanguageName: Italian,Name: Carla} {Gender: Male,Id: Karl,LanguageCode: is-IS,LanguageName: Icelandic,Name: Karl} {Gender: Female,Id: Dora,LanguageCode: is-IS,LanguageName: Icelandic,Name: Dóra} {Gender: Male,Id: Mathieu,LanguageCode: fr-FR,LanguageName: French,Name: Mathieu} {Gender: Female,Id: Celine,LanguageCode: fr-FR,LanguageName: French,Name: Céline} {Gender: Female,Id: Chantal,LanguageCode: fr-CA,LanguageName: Canadian French,Name: Chantal} {Gender: Female,Id: Penelope,LanguageCode: es-US,LanguageName: US Spanish,Name: Penélope} {Gender: Male,Id: Miguel,LanguageCode: es-US,LanguageName: US Spanish,Name: Miguel} {Gender: Male,Id: Enrique,LanguageCode: es-ES,LanguageName: Castilian Spanish,Name: Enrique} {Gender: Female,Id: Conchita,LanguageCode: es-ES,LanguageName: Castilian Spanish,Name: Conchita} {Gender: Male,Id: Geraint,LanguageCode: en-GB-WLS,LanguageName: Welsh English,Name: Geraint} {Gender: Female,Id: Salli,LanguageCode: en-US,LanguageName: US English,Name: Salli} {Gender: Female,Id: Kimberly,LanguageCode: en-US,LanguageName: US English,Name: Kimberly} {Gender: Female,Id: Kendra,LanguageCode: en-US,LanguageName: US English,Name: Kendra} {Gender: Male,Id: Justin,LanguageCode: en-US,LanguageName: US English,Name: Justin} {Gender: Male,Id: Joey,LanguageCode: en-US,LanguageName: US English,Name: Joey} {Gender: Female,Id: Ivy,LanguageCode: en-US,LanguageName: US English,Name: Ivy} {Gender: Female,Id: Raveena,LanguageCode: en-IN,LanguageName: Indian English,Name: Raveena} {Gender: Female,Id: Emma,LanguageCode: en-GB,LanguageName: British English,Name: Emma} {Gender: Male,Id: Brian,LanguageCode: en-GB,LanguageName: British English,Name: Brian} {Gender: Female,Id: Amy,LanguageCode: en-GB,LanguageName: British English,Name: Amy} {Gender: Male,Id: Russell,LanguageCode: en-AU,LanguageName: Australian English,Name: Russell} {Gender: Female,Id: Nicole,LanguageCode: en-AU,LanguageName: Australian English,Name: Nicole} {Gender: Female,Id: Marlene,LanguageCode: de-DE,LanguageName: German,Name: Marlene} {Gender: Male,Id: Hans,LanguageCode: de-DE,LanguageName: German,Name: Hans} {Gender: Female,Id: Naja,LanguageCode: da-DK,LanguageName: Danish,Name: Naja} {Gender: Male,Id: Mads,LanguageCode: da-DK,LanguageName: Danish,Name: Mads} {Gender: Female,Id: Gwyneth,LanguageCode: cy-GB,LanguageName: Welsh,Name: Gwyneth} {Gender: Male,Id: Jacek,LanguageCode: pl-PL,LanguageName: Polish,Name: Jacek}

deruelle commented 7 years ago

hey @rlimonta !

I'd like to have @gvagenas opinion on that but from my side to get you moving I would see:

  1. let's pick one default english voice (both male and female) and create a new github issue to support multiple voice per language.

  2. If there is no Female Gender for a language, put the Male Voice for Female as well and open an issue to disallow a specific gender for a given language.

rlimonta commented 7 years ago

Ok @deruelle, I'll wait for @gvagenas opinion.

rlimonta commented 7 years ago

@deruelle @gvagenas Guys, We have another option to discover the supported languages dynamically. For this, we would have to make one more call to the Polly API by request.

I need to receive the selected language in the (ISO 639 code for the language name-ISO 3166 country code) pattern, like pt-BR for Brazilian Portuguese.

In this way, with each new language supported by AWS Polly, the platform would be ready.

deruelle commented 7 years ago

@rlimonta we can potentially cache the result in memory to avoid the additional one more call per request right ?

rlimonta commented 7 years ago

@deruelle yes!

gvagenas commented 7 years ago

@rlimonta to keep all TTS plugins in sync we keep a list of languages/gender that are common to all plugins so in the RCML we have to pick only language and gender.

Also, we have some checks for exceptions like if user asks for female voice but the TTS plugin doesn't support female voice, we default to male gender.

I suggest you to check the existing languages/genders we support for VoiceRSS and Acapela and conform with this list and later we can check how to add missing languages/genders we would like to have.

Also, the restcomm.xml should change to hide the language/gender setting of each TTS service but we can make it a separate task.

About the dynamic discovery, I think we have to keep the requests to minimum in order to avoid additional delays. Keep in mind we have a live call waiting for the announcement so every ms counts.

George

rlimonta commented 7 years ago

@gvagenas The AWS Polly works like Acapela. If we create a configuration like Acapela speakers I believe it will be excellent. I had not seen this alternative.

gvagenas commented 7 years ago

@rlimonta great, let's proceed like this and later we can check again all the available languages/genders and modify configuration accordingly

rlimonta commented 7 years ago

@gvagenas Ok!

rlimonta commented 7 years ago

@gvagenas and @deruelle, Is there any utility to convert pcm to wav. The AWS Polly only supports Mp3, Vorbis and Pcm formats. If not, I can implement this.

gvagenas commented 7 years ago

@rlimonta no we don't have such a util. You can proceed to create it, add it in the restcomm.commons module in case other TTS services will need it later

rlimonta commented 7 years ago

@gvagenas Ok!!

deruelle commented 7 years ago

@rlimonta telephony-quality (8 kHz) audio in PCM format. https://aws.amazon.com/blogs/aws/polly-text-to-speech-in-47-voices-and-24-languages/ this is the right format for wav file for the Media Server to consume, I think if you just save it as .wav extension it will work. @hrosa can confirm.

hrosa commented 7 years ago

RestComm Media Server format is .wav (Sample rate of 8000Hz, bit rate of 8, Mono)

Helpful command to convert audio files for RMS compliance: ffmpeg -i source_file.wav -acodec pcm_s16le -ac 1 -ar 8000 result_file.wav

rlimonta commented 7 years ago

Ok @hrosa, Thank you!

deruelle commented 7 years ago

@rlimonta were you able to make it work with the command from @hrosa ?

rlimonta commented 7 years ago

@deruelle @hrosa I implemented a utility class to make the conversion. I want to finish today and submit for your evaluation.

rlimonta commented 7 years ago

@deruelle I finished the implementation but I am not allowed to create a new branch. Could you create a branch for the issue # 1618?

deruelle commented 7 years ago

Great @rlimonta ! Please do a pull request against master as described in the Open Source Playbook

rlimonta commented 7 years ago

@deruelle done.

Vaibhavarora08 commented 6 years ago

How can we implement Polly for bilingual languages .. I tried using language code but it's not working can someone please help