XD-DENG / Rtts

R package "Rtts" - Text to Speech
http://cran.r-project.org/web/packages/Rtts/
5 stars 1 forks source link

shiny functionality #2

Closed germayneng closed 7 years ago

germayneng commented 7 years ago

Hi XD,

Really appreciate your work for Rtts. Good job! I am trying to implement this text to speech function in shiny, where any input text will be converted and played as speech. However, it is not converted real time. I have to refresh the shiny app to update the speech.

Below is my code for reference:

library(shiny)
library(Rtts)

ui <- fluidPage(

   # Application title
   titlePanel("Just some text to speech example"),

   fluidRow(textInput("caption","Enter the caption")),
   fluidRow(verbatimTextOutput("answer"), width = 4),

  # plays from www/  
  fluidRow(tags$audio(src = "sound.wav", type = "audio/wav", controls = NA), helpText("Key in any sentences, wait and press F5. Then you can play your audio")),

  fluidRow(selectInput("speak","Speaker:",
                       c("Bruce" = "Bruce",
                         "Theresa" = "Theresa",
                         "Angela" = "Angela",
                         "MCHEN_Bruce" = "MCHEN_Bruce",
                         "MCHEN_Joddess" = "MCHEN_Joddess",
                         "ENG_Bob" = "ENG_Bob",
                         "ENG_Alice" = "ENG_Alice",
                         "ENG_Tracy" = "ENG_Tracy")))

  )

server <- function(input, output) {

textfunction <- reactive({
    thetext <- input$caption
    tts_ITRI(thetext, destfile = "www/sound.wav", speaker = input$speak)
    "Done!"

        })

output$answer <- renderText({textfunction()})

}

# Run the application 
shinyApp(ui = ui, server = server)
XD-DENG commented 7 years ago

Hi @germayneng , thanks for using Rtts!

Actually Rtts invokes an API for text-to-speech. Up to the API server status, it may take a few seconds or longer time to process. If it's too long, the function will return a reminder and suggest try again later.

For your application, I have a few suggestions for your reference:

  1. Add a button Start Converting. The current UI is a bit confusing: will the converting start after I enter the text? Or it will start when I click on the 'Play' button?

  2. You may want to put the audio file into a temporary folder instead of specifying it to be www/sound.wav. Because this will cause error when multiple users are using the application at the same time. Additionally, using temporary path to store the audio files may add point in terms of security

  3. Given the long latency (there is not much space to improve as it's 100% up to the API server), I would suggest you to design the whole application as a "queue". Different users can submit the text for converting, and the main interface is a table listing all requests together with their status ("done" or 'failed') and download link (the path of the audio files).

germayneng commented 7 years ago

Thank you @XD-DENG for the prompt reply,

I am using this to test the function for the text to speech, hence the messy and basic UI. I understand its based on an API. I need a text to speech function for shiny to return sentences based on what user input.

When the user input a sentence, I want my app to give a response and this response will be converted to speech using your function. However, I do not know how to make it return and play the response since what tts does is to download an audio. Hence, I am trying out if its possible to do a real time play.

As such, my purpose is not for user to convert their text input and play it out in the app but to test if it is possible to do what i mentioned above.

Hope you can guide me

XD-DENG commented 7 years ago

@germayneng correct me if I'm wrong: so what you need is to do is to play some specific speeches? Like the app tell the users "the process is running", "Congratulations, the job is done", etc, some FIXED contents, right?

germayneng commented 7 years ago

@XD-DENG yes correct! but instead of playing precreated audio files, is it possible to create a character string, then play tts covert it to speech then play it?

I am attempting to create a mini ai app, and i need to utilize a text to speech API. But so far, apart from watson, your function is the only one that have a tts for R

XD-DENG commented 7 years ago

@germayneng My concern is still the latency. The API server is extremely unstable. My understanding is for your app you need "real-time" response to users, which is almost impossible for this API.

A few other commercial API service provided by Google & Microsoft will respond much faster, like this one https://www.microsoft.com/cognitive-services/en-us/speech-api. But they would charge.

germayneng commented 7 years ago

@XD-DENG I understand the limitations. However, we still have not address the possibility of playing the audio real time. Most API processes using r will be sending the string over and return the audio file downloaded. But i do not know how to instantly play it from shiny.

XD-DENG commented 7 years ago

@germayneng I did try to do that previously.

I haven't touched Shiny development for a bit long time. Give me some time, I can get back to you with a demo. Possibly within this week.

germayneng commented 7 years ago

@XD-DENG Thank you XD! :D

I understand that R's strength does not lie with working with API. But I am doing a side project to create a mini-ai. I will upload to my github sometime, and you can take a look if you are interested.

XD-DENG commented 7 years ago

Hi @germayneng , may you share your email address so that I can send the demo to you

XD-DENG commented 7 years ago

@germayneng you can also directly go to https://github.com/XD-DENG/Reactively-Play-Audio-Shiny. I've hosted the demo codes in it.

germayneng commented 7 years ago

@XD-DENG Thanks XD. I have seen the demo.

I think you misunderstood what i mean. Is there a way to reactively get text input from user, and real time output the audio? Similar to what i have created.

for reference, here is something i created: https://github.com/germayneng/Mini-Mobile-AI/blob/master/README.md

I am trying to get the app to return the string text in speech

XD-DENG commented 7 years ago

@germayneng Given the text users enter will be "free text", it can be anything, your app can only rely on API. But as we discussed earlier, the API latency is too high.

You idea will only be doable if you would like to use any high-quality commercial TTS API, like Microsoft's.

germayneng commented 7 years ago

@XD-DENG yup I understand the limitation.

From a technical standpoint, is it possible to do that in shiny? Can you include in your demo? I can try replacing using IBM watson's API

i replaced your function using IBM's API. The return is instant. (i want to point out the problem is not the latency) However, when i play the audio, it plays the previous text. I have to press F5 to refresh the app in order to play the current. Do you know how to fix this?