Closed WilcoFiers closed 4 years ago
Here's some data about how lang codes work: https://qa.powermapper.com/Tests/ACT-R/primary-ext-lang.html
The nl-NL (male) and nl-BE (female) pronunciation in VoiceOver sounds very similar to me - but I'm not a native speaker, so might be worth double-checking.
After looking at some browser source code I think the language switch is implementation dependent, and depend on both the accessibility API being used and how the speech synthesiser used by the AT works.
For example, NVDA supports language switching with the eSpeak synthesiser, but needs additional Windows language packs installed to support switching with the Windows OneCore synthesiser.
Here's how accessibility APIs expose language to screen readers:
IAccessible2 used by Firefox/Orca only supports language/country pairs (e.g. fr-CA) https://accessibility.linuxfoundation.org/a11yspecs/ia2/docs/html/struct_i_a2_locale.html
UIA used on Windows only supports LCIDs: https://docs.microsoft.com/en-us/windows/win32/winauto/uiauto-textattribute-ids#UIA_CultureAttributeId these mostly correspond to language/country pairs, but sometimes have additional BCP 47 info: https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/6c085406-a698-4e12-9d4d-c3b0ee3dbc4a Note: only a limited number of BCP 47 lang codes that can be translated to LCIDs - listed here: https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/6c085406-a698-4e12-9d4d-c3b0ee3dbc4a
There are examples and rules here: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPInternational/LanguageandLocaleIDs/LanguageandLocaleIDs.html Note: the page ends with this warning:
be sure to follow the rules found in sections 2.2.1 and 4.5 of BCP 47: Tags for Identifying Languages. Tags that do not follow these conventions are not guaranteed to work
espeak-ng uses BCP-47 with some extensions. The HowTo for adding a new language shows how this works: https://github.com/espeak-ng/espeak-ng/blob/master/docs/add_language.md https://github.com/espeak-ng/espeak-ng/blob/master/docs/voices.md
eSpeak has a data directory full of config files which control how language choice works. The config files are searched for the closest matching language (e.g. es-419 = es-mx = South American Spanish, es = European Spanish). This means lang=es-mx
(Mexico) selects South American Spanish, but lang=es-ar
(Argentina) selects European Spanish because es-ar
isn't listed in the config files. This can change at any time if the config files are updated.
https://github.com/espeak-ng/espeak-ng/tree/master/espeak-ng-data/lang/roa
@WilcoFiers - Was there a reason this issue was closed?
@dd8, thanks for the extensive notes. Helpful 👍
Currently the "valid language" rules require that the language tags follow the BCP-47 syntax. @kengdoj pointed out to the ACT Task Force that this might be too strict. Looking at what browsers are doing today, CSS 2.2 and CSS 3 have a simpler algorithm defined for matching language tags. We could consider adopting this. There are some drawbacks to this, mainly that grandfather tags may be more difficult to match.
Another approach could be to look at RFC 4647, which is specifically about matching language tags. It is used in CSS 4 in the
:lang()
pseudo class. There are some parts to that (like wildcards "*") which do not work for what we are doing with rules though, so we'd have to deal with that.