Inclusio-Community / json-image-metadata

A specifiction for expressing technical image metadata, with an emphasis on accessibility of data visualizations
2 stars 3 forks source link

Conditional Announcement Voicing #20

Open shepazu opened 3 months ago

shepazu commented 3 months ago

Note: this context for this issue is in #18.

Reference image: E006-Plant_Cell_Structure-experimental.svg (experimental)

We might want to have conditional voicing for some features, to avoid overwhelming the reader. A concrete example of this is the cytoplasm. It’s everywhere inside the cell that isn’t another cell structure. Most accessible viewing apps will trigger an announcement of the shape’s label whenever you move your finger (or cursor) onto that shape. So by default, any time you move back to the cytoplasm from, say, a chloroplast or the vacuole, it will announce the word cytoplasm.

It can get worse; viewing apps are likely to have 2 different triggers for voicing the long description: doubletap, or delay. With doubletap (or longpress), the user is explicitly invoking the description, while with a delay, first the label is read, then if the user lingers over that same shape for a second or two, the description is voiced. Since there’s so much of the cytoplasm, there’s a good chance that as you move across the cytoplasm to another shape, it will not only will it read the label each time you transit to the cytoplasm, but it may also read the long description. This seems like it could cause an overwhelming experience for users, making it harder to figure out when they’ve moved to a new shape, and perhaps even causing anxiety and “rushing” finger movements, unconducive to kinesthetic exploration.

On the other hand, you could just not label it… but I think that’s a poor solution, since it is important.

My proposed solution to this is an addition to the JIM which allows authors to specify when announcements should be made for any given element (selector). So, the default would be that the label is always read out, but certain elements, like the cytoplasm, would only be read the first time they’re encountered; this would apply for any given “touch session”; raising the finger off the touch surface would reset that parameter, so users could easily reorient themselves when they place their finger back on the surface. Other explicit actions, like doubletap, would read out the name and description even for “suppressed” elements.