chrisjp / tts

A simple tool to demo text-to-speech using various services' voices. HTML5 and Vanilla JS.
https://lazypy.ro/tts
MIT License
68 stars 15 forks source link

IBM Watson voices not generating audio #6

Closed chrisjp closed 1 year ago

chrisjp commented 1 year ago

The IBM Watson demo website has changed how it functions. The current methods are not erroring however they are returning empty mp3 files.

I've noticed that several requests are being made now. Firstly, on page load a POST is sent to https://www.ibm.com/demos/live/tts-demo/api/tts/session to initialize a session. Every time a user pauses typing in the textarea a POST is made to the same URL to check the session ID exists. If you leave the page open for more than a few minutes it won't initialize a new one, you have to manually refresh.

When clicking play, a POST request is then sent to https://www.ibm.com/demos/live/tts-demo/api/tts/store with JSON data containing the session ID and the text to be synthesized, which is stored in the session:

{
    "ssmlText":"<prosody pitch=\"0%\" rate=\"-0%\">Whatever text you want spoken here</prosody>",
    "sessionID":"ca47b5a9-27ad-4d2e-9b9f-38fa548b4140"
}

Finally, we GET a URL like the following containing the voice id and session ID. e.g: https://www.ibm.com/demos/live/tts-demo/api/tts/newSynthesizer?voice=en-GB_KateV3Voice&id=c4e4d21f-efd4-40a8-b696-e486ea4c6a5b It appears the session is valid for a very short time, perhaps only a couple of minutes, and consequently the audio is quickly discarded too.

This hopefully won't be too much of an issue to fix as we can just make all the requests successively via the proxy.php script.

chrisjp commented 1 year ago

I've now managed to get these voices working again - commenting here for myself in case needed for future reference.

  1. First it's necessary to POST to the session endpoint and then save the cookie data it responds with. Unlike their demo site we don't need to send any subsequent requests to this endpoint - I'm not sure why they even do that themselves.
  2. Next request goes to the store endpoint and it turns out you can just fake a UUID-like string to act as the session ID. Cookies from the first request need to be sent with the headers, I also need to set the correct Content-Type (for json).
  3. Final request (GET this time) goes to a URL like the one I showed in the OP, with the voice ID and UUID as query parameters. Again, the cookies from the first request need to be sent with the headers here otherwise it won't return anything or will error. If done correctly it'll return the raw mp3 audio data - which can then either be saved to a file or base64 encoded to play directly in the browser.

Took a fair amount of trial and error to figure out exactly what was necessary for the requests to work!