HadrienGardeur / web-speech-recommended-voices

A list of recommended voices for the Web Speech API
https://panac.github.io/readium-speech/demo/
Creative Commons Zero v1.0 Universal
10 stars 0 forks source link
a11y accessibility cross-browser cross-platform read-aloud speech-synthesis text-to-speech tts voices web-speech-api

Recommended voices for the Web Speech API

This repository is part of a larger project, meant to identify best practices for implementing a read aloud feature in reading apps.

With hundreds of voices available by default across various browsers and OS, it can be tricky for developers to provide sensible defaults and a curated list of voices.

With its focus on voice selection, the goal of this project is to document higher quality voices available on various platforms and provide an easy way to implement these recommendations using JSON configuration files.

Use cases

Demo

A live demo is available to test which recommended voices are available in your browser.

List of supported languages

List of voices to filter out

At the other end up the spectrum, this project also identifies a number of voices that should be filtered out from a voice selector component.

Some of them are harmful to the overall reading experience, while others have a very low quality on platforms where better preloaded options are available.

Guiding principles

Syntax

A JSON Schema is available for validation or potential contributors interested in opening a PR for new languages or voice additions.

Label

label is required for each recommended voice and provides a human-friendly label for each voice.

This string is localized for the target language and usually contains the following information:

Example 1: Microsoft Natural voices

While the names documented by Microsoft for their natural voices are easily understandable, they tend to be very long and they're all localized in English.

{
  "label": "Isabella (Italia)",
  "name": "Microsoft Isabella Online (Natural) - Italian (Italy)",    
  "language": "it-IT"
}

Example 2: Chrome OS voices

Chrome OS provides a number of high quality voices through its Android subsystems, but they come with some of the worst names possibles for an end-user.

{
  "label": "Female voice 1 (US)",
  "name": "Android Speech Recognition and Synthesis from Google en-us-x-tpc-network",
  "language": "en-US"
}

Names

name is required for each recommended voice and it's used as the main identifier for voices in this project.

Names are mostly stable across browsers, which means that for most voices, a single string is sufficient.

But there are unfortunately some outliers: Android, iOS, iPadOS and macOS voices.

For those voices, at least a portion of the string is often localized, naming can be inconsistent across browsers and they can change depending on the number of variants installed.

Because of this, each list can also contain the following properties:

Example 3: Alternate version of an Apple preloaded voice

{
  "label": "Samantha (US)",
  "name": "Samantha",
  "localizedName": "apple",
  "altNames": [
    "Samantha (Enhanced)",
    "Samantha (English (United States))"
  ],
  "language": "en-US"
}

Languages

language is required for each recommended voice.

It contains a BCP 47 language tag where a downcased two-letter language code is followed by an uppercased two-letter country code.

The language and country codes are separated using a hyphen (-).

Somes voices are also capable of handling another language, for example a Spanish voice for the United States might also be capable of handling English.

For this reason, an additionalLanguages property is also available although it is fairly rarely used right now.

It contains a list of languages using only two-letter codes, without a sub-tag.

Some brand new voices from Microsoft are also capable of a multilingual output. The language switch isn't supported in the middle of a sentence, but the output seems capable of auto-detecting the language of each sentence and adopt itself accordingly.

In order to support this, the output might automatically switch to a different voice in the process.

These voices are identified using the multiLingual boolean.

Example 4: Voice with a multilingual output

{
  "label": "Emma (US)",
  "name": "Microsoft EmmaMultilingual Online (Natural) - English (United States)",
  "language": "en-US",
  "multiLingual": true
}

Example 5: Voice capable of handling a secondary language

{
  "label": "Sylvie (Canada)",
  "name": "Microsoft Sylvie Online (Natural) - French (Canada)",
  "language": "fr-CA",
  "otherLanguages": [
    "en"
  ]
}

Gender and children voices

gender is an optional property for each voice, that documents the gender associated to each voice.

The following values are supported: female, male or neutral.

children is also optional and identifies children voices using a boolean.

Example 6: Female children voice

{
  "label": "Ana (US)",
  "name": "Microsoft Ana Online (Natural) - English (United States)",
  "language": "en-US",
  "gender": "female",
  "children": true
}

Quality

quality is an optional property for each voice, that documents the quality of the various variants of a voice.

The following values are supported:

veryHigh
Very high, almost human-indistinguishable quality of speech synthesis
high
High, human-like quality of speech synthesis
normal
Normal quality of speech synthesis
low
Low, not human-like quality of speech synthesis
veryLow
Very low, but still intelligible quality of speech synthesis

Example 7: An Apple voice available in three quality variants

{
  "label": "Ava (US)",
  "name": "Ava",
  "note": "This voice can be installed on all Apple devices and offers three variants. Like all voices that can be installed on Apple devices, it suffers from inconsistent naming due to localization.",
  "altNames": [
    "Ava (Premium)",
    "Ava (Enhanced)",
    "Ava (English (United States))",
  ],
  "language": "en-US",
  "gender": "female",
  "quality": [
    "low",
    "normal",
    "high"
  ],
  "rate": 1,
  "pitch": 1,
  "os": [
    "macOS",
    "iOS",
    "iPadOS"
  ]
}

OS and browser

Both os and browser are optional properties. They're used to indicate in which operating systems and browsers a voice is available.

These two properties are meant to be interpreted separately and not as a combination.

Example 8: A Microsoft voice available in both Edge and Windows

{
  "label": "Denise (France)",
  "name": "Microsoft Denise Online (Natural) - French (France)",
  "note": "This voice is preloaded in Edge on desktop. In other browsers, it requires the user to run Windows 11 and install the voice pack.",
  "language": "fr-FR",
  "gender": "female",
  "os": [
    "Windows"
  ],
  "browser": [
    "Edge"
  ]
}

In addition, preloaded indicates if the voice is preloaded in all the OS and browsers that have been identified.

With the current approach, it's not possible to indicate that a voice is available on Chrome and Windows, but requires a download on Windows for example.

Example 9: A Google voice preloaded in Chrome Desktop

{
  "label": "Google female voice (UK)",
  "name": "Google UK English Female",
  "language": "en-GB",
  "gender": "female",
  "browser": [
    "ChromeDesktop"
  ],
  "preloaded": true
}

Speech rate and pitch

When using the Web Speech API, SpeechSynthesisUtterance supports optional values for:

Each voice documented in this repo supports the following optional properties:

Example 10: Microsoft voice where the pitch cannot be adjusted

{
  "label": "Ana (US)",
  "name": "Microsoft Ana Online (Natural) - English (United States)",
  "language": "en-US",
  "gender": "female",
  "pitchControl": false
}

Example 11: Google voice with recommended pitch and speed rates

{
  "label": "Voix Google féminine (France)",
  "name": "Google français",
  "language": "fr-FR",
  "gender": "female",
  "rate": 1,
  "pitch": 0.8
}

Additional notes

Through the work done to document a list of recommended voices, I also ended up testing various browsers/OS to see how they behave. This section is meant to summarize some of this information.

A dedicated label is also available to track external issues reported to Apple, Google, Microsoft or Mozilla.

General

Android

Chrome Desktop

Chrome OS

Edge

Firefox

iOS and iPadOS

macOS

Safari

Windows