Open wez opened 3 years ago
@wez Good point, thanks for bringing this up. @Tyriar has worked in that field for xterm.js and even proposed an OSC sequence scheme for accessibility annotations to the terminal-wg. This might be a good starting point (cannot find the link to the proposal atm).
👍 having some way of adding some optional alt text would be handy. Here's the link, it was about more generally supporting the idea of alt text even for text since often text is decorative: https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/18
I think alt text on images are something different than general accessibility. I wonder why i did not find that post of you earlier, @Tyriar - fun fact: i am actually severily visually disabled.
So regardless of the image protocol, if you would want to implement that OSC for accessibility, and @wez has interest in that too, i would implement it happily also . (Maybe without rushing it, but i am all in).
On the alt text, that has good use, because that can be displayed if an image got evicted. So perfect and many thanks for this idea.
I'm pleased that there is a proposal for screen reader support.
One of the things that @katef and I were chatting about was incentivizing good accessibility practice in exchange for more fancy terminal features. It would be great if we could give a bit of a shove for accessibility as part of adding this new image protocol. It feels difficult to point application developers towards a draft screen reader extension with no implementations, but if there was a pre-defined metadata field expressly for use by screen readers in the image protocol, then we could recommend that it be used by clients.
That said, the img
alt
tag has been underused since day one, so it may not be the best model to emulate here. I don't want to force this if it doesn't make sense; just wanted to get us thinking about what would actually be useful while we're designing!
I think the img-alt text does indeed make sense, because it can not be useful to screen readers but also shown als alt-t xt when the image is evicted.
love the alt-text idea. am considering how it would be exposed to a toolkit. i feel that if it's a secondary function call no one will ever use it, so perhaps a string in the ncvisual_options
struct
? that's easy enough. is it useful to throw in generic milquetoast if not provided alt text? i.e. i could fill in "PNG of size XXXXXXX and geometry 50x20" or even that plus " with a palette rich in greens" etc if i was feeling froggy.
I like the idea of funny-but-probably-not-quite-professional-but-also-not-offensive auto-generated alt text to give the app developer a laugh and not resent the toolkit for being overly opinionated, but also to encourage them to put something more useful in its place. However, in my experience, engineers will just fill it with ""
or "."
to circumvent perceived arbitrary requirements.
There are folks doing AI/ML based alt tag generation... feels like it would blow up the deps to include by default :-p
One of the things that Kate and I discussed was how a lot of apps emit stuff like figlet or ascii graphics to represent things like progress bars that are terrible from an accessibility perspective. If the toolkit knows that it is emitting graphics for a progress bar then it would be great to auto-fill eg: "50% progress" for the alt text.
(this is a bit OT: wrt GIP
, happy to fork off somewhere else)
I think alt text on images are something different than general accessibility. I wonder why i did not find that post of you earlier, @Tyriar - fun fact: i am actually severily visually disabled.
So regardless of the image protocol, if you would want to implement that OSC for accessibility, and @wez has interest in that too, i would implement it happily also . (Maybe without rushing it, but i am all in).
I'm interested in doing something here (modulo my available bandwidth), but I'm not sure what good looks like vs. technically accessible.
I found https://github.com/tspivey/tdsr which implies that there is a desire for something well-integrated in the terminal (eg: tied into the various copy/vim/quick select modes) for navigating beyond what eg: the macOS VoiceOver facility might provide.
One of the things that Kate and I discussed was how a lot of apps emit stuff like figlet or ascii graphics to represent things like progress bars that are terrible from an accessibility perspective. If the toolkit knows that it is emitting graphics for a progress bar then it would be great to auto-fill eg: "50% progress" for the alt text.
While I think this is feature has zero intersection with the image protocol, the way I see one could design that feature is similar to how OSC 8 was introduced for hyperlinks. There you also have text along with a URL.
So what one would then need (for your non-image general-purpose-idea) is similar to this:
<Introducer> read-out-loud text <Separator> standard VT stream <EndMarker>
In the OSC-8 case, they used one single VT sequence with simply different parameters. The read-out-loud text
part could e.g. be the embedded payload of an OSC whereas the EndMarker
is the same OSC but with no payload at all. (just like OSC-8 for hyperlinks). (p.s.: I intentionally did not suggest any syntax like OSC numbers and specific payload format, if any :) )
EDIT: p.s.: you say screen-reader app, but I think that thing should be at least kind of a plugin to your TE, because otherwise you won't have access to that meta-info, or what do I misunderstand here ? (I should read the above link, sorry)
oooh this is a great idea for progress bars period, time to add a new flag bit to ncprogbar_options
One of the things that Kate and I discussed was how a lot of apps emit stuff like figlet or ascii graphics to represent things like progress bars that are terrible from an accessibility perspective. If the toolkit knows that it is emitting graphics for a progress bar then it would be great to auto-fill eg: "50% progress" for the alt text.
While I think this is feature has zero intersection with the image protocol, the way I see one could design that feature is similar to how OSC 8 was introduced for hyperlinks. There you also have text along with a URL.
So what one would then need (for your non-image general-purpose-idea) is similar to this:
<Introducer> read-out-loud text <Separator> standard VT stream <EndMarker>
In the OSC-8 case, they used one single VT sequence with simply different parameters. The
read-out-loud text
part could e.g. be the embedded payload of an OSC whereas theEndMarker
is the same OSC but with no payload at all. (just like OSC-8 for hyperlinks). (p.s.: I intentionally did not suggest any syntax like OSC numbers and specific payload format, if any :) )
Yeah, I like this idea; one thing that feels a bit difficult to me in the proposed OSC 200 - 202 is that it feels very temporal or immediate; the speech portion is logically similar to the bell or toast notification behavior, and not clearly associated with regions of the text buffer (that may just be a wording issue in that proposal).
From my brief excursion reading docs for emacspeak and TDSR, it seems like the main value for assistive/reader stuff in the terminal is providing finer control over what is read (character, word, sentence, paragraph, line) and being able to scan backwards or forwards or to "randomly" access the textual content, so it seems like the screen reader hints need to be explicitly tied to the cells in the buffer so that they can be applied when the read cursor reaches that point.
It also seems to me like it might be useful to have audio cues for things like "the rate of output is now faster than a human could speed read it, let alone speak it aloud" where you might reasonably want to CTRL-C it (this feels like a TE feature rather than an escape sequence fed by an application).
EDIT: p.s.: you say screen-reader app, but I think that thing should be at least kind of a plugin to your TE, because otherwise you won't have access to that meta-info, or what do I misunderstand here ? (I should read the above link, sorry)
Yeah, it seems that the best possible experience is when the underlying application knows how to do the right thing with a speech synthesizer (and pulls it off), then the second best is for the TE to try to provide essentially an "audio viewport" and reading cursor location that is distinct from the normal visual viewport and input cursor location. TDSR seems like it tries to do this, but there's only so far it can go without being an actual TE for itself.
Image support is not just images: it is also resized text (e.g VT100 double-width/double-height), CJK, and emojis. So something like alt-text could go much farther than specific images and animations.
It also goes further than accessibility! For example, it would be really cool IMHO to be able to alt-text powerline symbols such that mouse-over could include tooltips for what these blasted symbols actually mean. Or to have the LaTeX source of a math image handy that could be copied to clipboard.
Image support is not just images: it is also resized text (e.g VT100 double-width/double-height), CJK, and emojis. So something like alt-text could go much farther than specific images and animations.
The way i see this, that could be achieved just like hyperlinks via OSC 8
, so mark a few grid cells with a tooltip (via a newly introduced OSC) to be shown when the user hovers that grid cell. The tooltip would most likely use OS GUI facilities to show the tooltip (i.e. not terminal style purely text based).
I think that could work and is even easy to implement.
A few other TE devs might not like the fact that the grid cell will become bigger and bigger though :)
It also goes further than accessibility! For example, it would be really cool IMHO to be able to alt-text powerline symbols such that mouse-over could include tooltips for what these blasted symbols actually mean.
I think that should however be easily implementable by the controlling terminal app itself by just registering to mouse track events and then create custom popups (text or image) to contain that information. That should even work for shell programs - i would say.
Or to have the LaTeX source of a math image handy that could be copied to clipboard.
Maybe also just like the above counter proposal along with OSC 52
for setting the clipboard? Hmmm
I think @codeofdusk from NVDA would have some useful input here.
Aha. Damn. Why is this for Windows, @codeofdusk, i am actually severity visually disabled, i might end up being a curious bets tester of your app if it can run in Linux also :-)
p.s. @klamonte sorry for replying so late and being a little absent, but i have to do some intense child care up until a few more weeks to go. :)
I don't have a concrete proposal here, but wanted to raise accessibility while we're thinking about this.
It seems prudent to consider adding some metadata for screenreaders; perhaps something like HTML's ALT text and/or ARIA attributes.
@katef may have some thoughts on what is actually useful, or know others that do!