Shared-Reality-Lab / IMAGE-server

IMAGE project server components
Other
2 stars 7 forks source link

Allow control/preference for TTS or rendering speed? #17

Open jeffbl opened 3 years ago

jeffbl commented 3 years ago

Listening to that example raises a new issue as well: in screen readers, experienced users speed up the voice considerably. If we're rendering on the server, we're going to need some indication of how much to speed up the voices, and this would have to be supported by the TTS as well. For some renderings, I expect we might ignore that settings since the focus won't be on just blasting through text content as quickly as possible, but rather, creating an overall experience that may not lend itself to a sped-up voice...

Originally posted by @jeffbl in https://github.com/Shared-Reality-Lab/auditory-haptic-graphics-server/issues/7#issuecomment-834666921

jeffbl commented 3 years ago

Pulled this out of #7 - this is a longer-term issue, but I expect there will be participants who find a standard rendering speed too slow. However, just speeding things up may make things like spatialized renderings funny. So I expect this will have to be a preference, not something that is always done, with the handler making the call on whether it makes sense for what it is producing.

JRegimbal commented 3 years ago

For the FastSpeech2 model used by default in the TTS branch, speed_control_alpha can be used to adjust the speed of the synthesized speech. https://github.com/Shared-Reality-Lab/auditory-haptic-graphics-server/blob/da843305d0af66e1cdc0c35e33cff389682cc705/docker/espnet/src/espnet_util.py#L29

jeffbl commented 2 years ago

Note that this could potentially key off of the user's screen reader speed setting, if we can access that. This way when they adjust speed in screen reader, it could also adjust the speed of the TTS in the renderings by the same factor.