This adds initial text to speech support to AI for Oceans, intended for internal evaluation for starters.
It uses the browser's built-in SpeechSynthesis API, which looks to now be widely-available.
A variety of designs were attempted in building this PR. Currently, the user experience is that, if text to speech is enabled, typing sounds are not played, but the typing animation is still shown. The completion of the typing is still used to determine when the guide can be dismissed (partly due to the observation of a bug in at least one language when the full text was not being read out, meaning that relying on the speech completing was not a reliable way to decide that the guide could now be dismissed).
At least on Google Chrome on MacOS, playing the speech must be played in response to a user action. If the user clicks anywhere while the guide is typing, and before we've first played speech, then we attempt to play the speech. If the user clicks anywhere after the guide is done typing, and still before we've first played speech, then we make one attempt to play the speech; if the speech plays, then the next click will dismiss the guide. Once speech has played once, then on subsequent displays of the guide we attempt to play speech immediately. Speech is stopped whenever the guide is dismissed. Note that speech often isn't available right away, as the browser appears to do an asynchronous load of the voices.
This adds initial text to speech support to AI for Oceans, intended for internal evaluation for starters.
It uses the browser's built-in
SpeechSynthesis
API, which looks to now be widely-available.A variety of designs were attempted in building this PR. Currently, the user experience is that, if text to speech is enabled, typing sounds are not played, but the typing animation is still shown. The completion of the typing is still used to determine when the guide can be dismissed (partly due to the observation of a bug in at least one language when the full text was not being read out, meaning that relying on the speech completing was not a reliable way to decide that the guide could now be dismissed).
At least on Google Chrome on MacOS, playing the speech must be played in response to a user action. If the user clicks anywhere while the guide is typing, and before we've first played speech, then we attempt to play the speech. If the user clicks anywhere after the guide is done typing, and still before we've first played speech, then we make one attempt to play the speech; if the speech plays, then the next click will dismiss the guide. Once speech has played once, then on subsequent displays of the guide we attempt to play speech immediately. Speech is stopped whenever the guide is dismissed. Note that speech often isn't available right away, as the browser appears to do an asynchronous load of the voices.
We use the helpful https://github.com/HadrienGardeur/web-speech-recommended-voices in determining which voice to play. Currently, we consume their JSON for English and Italian, and use the first available match. Their demo at https://hadriengardeur.github.io/web-speech-recommended-voices/demo/ is helpful in hearing the variety of recommended voices for each language.
In this initial work, the standalone text to speech can be heard in English by loading with the appropriate URL parameter:
https://github.com/code-dot-org/ml-activities/assets/2205926/e5debc47-2f74-4700-9c9d-e8e1385310c5