Closed bertfrees closed 1 month ago
This is done except for localizing stock message.
Note that it is not possible to post a TTS config file with the /voices/[ID]
and /voices/[ID]/preview
endpoints. Only the global settings are used. This is a reasonable limitation, because setting properties inside a TTS config file is deprecated, and the rest of the TTS config file can not influence the voices.
Also note that /voices/[ID]
and /voices/[ID]/preview
calls need to be preceded by a /voices
call. (This is needed to get the preview links.) Settings may not be changed after the last /voices
call.
This last limitation might not be RESTful, but it was done in order to be able to do the necessary caching to make the whole thing snappy enough. It should be improved later. E.g. the caching should be done per client.
Fixed in 2be91f131
We discussed the following:
For the UI, we've looked at how other software does it. Usually there is a fixed string that is translated into the correct language and used for generating the sample.
Sometimes, the sample text can be changed.
The rate should affect the sample
We should try to handle the case where the user runs through the voices quickly in the "selected voices" table. The experience shouldn't be sluggish.
The approach we're going to try first is that the UI will fetch and cache audio files for some voices that are most likely to be requested next by the user. The user might get a “fetching previews” waiting message sometimes but it would speed up as the number of cached previews increased. For the case where the user inputs text to be generated as a preview, then it’s OK if they wait a second.
The web interface will be a link to a wav or mp3 file. The endpoint will be:
http://localhost:8181/ws/voices/$ID/preview?text=foo+bar&speech-rate=120%25
The ID of the voice will be included in the result of the
http://localhost:8181/ws/voices
endpoint, e.g.:The
text
parameter should probably be optional. If omitted, a stock message in the correct language will be played.For the TTS engines that support changing the speech rate, the sample is affected by the
org.daisy.pipeline.tts.speech-rate
property. The "speech-rate" parameter is optional and can be one of the following: