erkyrath / glkote

A Javascript display library for IF interfaces
http://eblong.com/zarf/glk/glkote.html
MIT License
26 stars 11 forks source link

Possible improvement for screen readers #31

Open curiousdannii opened 5 years ago

curiousdannii commented 5 years ago

@dfabulich wrote

Lectrote is only slightly better in this sense: it uses an HTML renderer, so it can tell VoiceOver about individual paragraphs. When you type a command, you remain focused on the command area, and you have to navigate backward through the text to find the new text and read that. But at least on Lectrote it is possible to navigate paragraph by paragraph. (Lectrote does mark the transcript window as a live region, but macOS VoiceOver doesn't appear to notice.)

I wonder if all that would need to be done to improve this would be to stop GlkOte from manually focusing the input element (with an option, the current code is probably best for non screenreader users.)

dfabulich commented 5 years ago

To be clear, this bug doesn't affect iOS VoiceOver. I can double check, but I'm pretty sure it doesn't affect popular Windows screen readers, either. This lead me to conclude that macOS VoiceOver is just buggy, and there's not much we can do.

But if we were to do something, instead of avoiding focusing the text field, I would suggest this change: currently when the user submits the <input> text field, it's removed from the DOM, and a new <input> is created at the bottom of the screen. This necessitates focusing on the new <input>. Instead, I suggest reusing the <input> field, so it's already focused, and doesn't need refocusing.

I sense that this probably won't work if the <input> field is being moved around the DOM, i.e. if it's being reattached with attachChild, but if the <input> box is just pinned to the footer of the screen, then it doesn't need to move, and it should behave normally.

But I also sense that this could be a lot of work.

dfabulich commented 5 years ago

I wrote this in email a while back.

One of the main compatibility issues with screen readers and parser-based interactive fiction on the web is that the UI paradigm for these games is very unusual.

Consider this example game. http://eblong.com/zarf/glulx/quixe/quixe/play-remote.html?story=stories/Advent.ulx.js

Below, I quote a typical transcript of play:

At End Of Road You are standing at the end of a road before a small brick building. Around you is a forest. A small stream flows out of the building and down a gully.

>east

Inside Building You are inside a building, a well house for a large spring.

There are some keys on the ground here.

There is tasty food here.

There is a shiny brass lamp nearby.

There is an empty bottle here.

>examine lamp It is a shiny brass lamp. It is not currently lit.

>get lamp Taken.

The feeling of play is conversational. You might imagine that the game is playing the role of a Dungeon Master in Dungeons and Dragons, where the player and game are taking turns speaking.

If I were coding a "conversational" game today, I'd probably take examples from chat-based apps like Slack, where there's an input box at the bottom of the screen, and a transcript of messages in the main part of the window.

But that is not at all how IF games work (not how they have ever worked historically). Instead of having a separate "input box" into which you type commands, there's just one main window area, the "transcript." The game prints messages into the transcript, then prints a prompt symbol (the ">" symbol) and lets the user type directly into the transcript.

To extend my Dungeons and Dragons metaphor, you might imagine the Dungeon Master writing something on a piece of paper, then handing that same piece of paper over to the player, who would write something at the bottom and then hand the paper back to the Dungeon Master, taking turns writing on the page.

This can be confusing to screen readers. One single rectangle is acting as both output and input. Inform/Quixe has handled this by marking the entire transcript area with "aria-live=polite," meaning that any new text that appears in the window is automatically read aloud.

But the bigger problem has to do with the "focus" of the screen reader.

When it's time for the user to type, the last line is an invisible input box, where the user types. When the user hits enter, the input box is removed and replaced with text that the user typed. Then the game says some response message, and finally a new invisible input box appears at the bottom of the screen.

Removing the currently focused UI element is very bad for screen readers. The screen reader software tries to guess where to focus next. It might try to focus the user on the top of the document (the very first line at the very beginning of the game), or it might focus on the top of the screen (probably a random moment a few turns back).

We can try to focus on the new input box, but focusing that box may interrupt the live ARIA reading of the game's message.

In an ideal world the WAI-ARIA group would have defined a semantic way to mark the transcript as a "console" or "terminal," indicating to screen readers how to handle the box, but that's wishful thinking. I can't find any other professional web application that handles transcripts the way IF handles transcript, for either business or pleasure, so as far as I know, we can't even copy what anyone else has done for 508 compliance. (I'd love to hear from y'all if you know of a web UI that does something like this and actually works.)

An obvious thing to do here would be to abandon the idea that the user should type directly into the transcript, and instead offer the user a fixed box to type in, at the bottom of the screen. That would probably work, but it would break some existing games.

For example, a number of games like to provide the user with a custom prompt inline with the text. "Please enter the secret code: " That may not make sense when the input box appears at the bottom of the screen.

Do y'all have experience with UI like this? How does anybody else handle problems like this?

P.S. I hesitate to mention it, but a few games even rely on the fact that the prompt is in the transcript to play games with the player. For example, "Taco Fiction" prints a fake prompt ">" into the transcript and then waits for the player to type any key. No matter what the user types, the game inserts a character of the author's choosing, "forcing" players to type what the author wants them to type, letter by letter.

I guess it's more of a practical joke than a serious UI consideration, but needless to say that joke would not work with a fully separate input box.