act-rules / act-rules.github.io

Accessibility conformance testing rules for HTML
https://act-rules.github.io/
Other
136 stars 69 forks source link

Make lang matching less strict (bf051a, de46e4) #1105

Closed WilcoFiers closed 4 years ago

WilcoFiers commented 4 years ago

Currently the "valid language" rules require that the language tags follow the BCP-47 syntax. @kengdoj pointed out to the ACT Task Force that this might be too strict. Looking at what browsers are doing today, CSS 2.2 and CSS 3 have a simpler algorithm defined for matching language tags. We could consider adopting this. There are some drawbacks to this, mainly that grandfather tags may be more difficult to match.

Another approach could be to look at RFC 4647, which is specifically about matching language tags. It is used in CSS 4 in the :lang() pseudo class. There are some parts to that (like wildcards "*") which do not work for what we are doing with rules though, so we'd have to deal with that.

dd8 commented 4 years ago

Here's some data about how lang codes work: https://qa.powermapper.com/Tests/ACT-R/primary-ext-lang.html

The nl-NL (male) and nl-BE (female) pronunciation in VoiceOver sounds very similar to me - but I'm not a native speaker, so might be worth double-checking.

dd8 commented 4 years ago

After looking at some browser source code I think the language switch is implementation dependent, and depend on both the accessibility API being used and how the speech synthesiser used by the AT works.

For example, NVDA supports language switching with the eSpeak synthesiser, but needs additional Windows language packs installed to support switching with the Windows OneCore synthesiser.

Here's how accessibility APIs expose language to screen readers:

IAccessible2 uses language-country

IAccessible2 used by Firefox/Orca only supports language/country pairs (e.g. fr-CA) https://accessibility.linuxfoundation.org/a11yspecs/ia2/docs/html/struct_i_a2_locale.html

UIA uses LCIDs

UIA used on Windows only supports LCIDs: https://docs.microsoft.com/en-us/windows/win32/winauto/uiauto-textattribute-ids#UIA_CultureAttributeId these mostly correspond to language/country pairs, but sometimes have additional BCP 47 info: https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/6c085406-a698-4e12-9d4d-c3b0ee3dbc4a Note: only a limited number of BCP 47 lang codes that can be translated to LCIDs - listed here: https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/6c085406-a698-4e12-9d4d-c3b0ee3dbc4a

macOS uses locale IDs which are BCP-47

There are examples and rules here: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPInternational/LanguageandLocaleIDs/LanguageandLocaleIDs.html Note: the page ends with this warning:

be sure to follow the rules found in sections 2.2.1 and 4.5 of BCP 47: Tags for Identifying Languages. Tags that do not follow these conventions are not guaranteed to work

dd8 commented 4 years ago

NVDA speech synthesisers

eSpeak

espeak-ng uses BCP-47 with some extensions. The HowTo for adding a new language shows how this works: https://github.com/espeak-ng/espeak-ng/blob/master/docs/add_language.md https://github.com/espeak-ng/espeak-ng/blob/master/docs/voices.md

eSpeak has a data directory full of config files which control how language choice works. The config files are searched for the closest matching language (e.g. es-419 = es-mx = South American Spanish, es = European Spanish). This means lang=es-mx (Mexico) selects South American Spanish, but lang=es-ar (Argentina) selects European Spanish because es-ar isn't listed in the config files. This can change at any time if the config files are updated. https://github.com/espeak-ng/espeak-ng/tree/master/espeak-ng-data/lang/roa

jeeyyy commented 4 years ago

@WilcoFiers - Was there a reason this issue was closed?

@dd8, thanks for the extensive notes. Helpful 👍