danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
17.39k stars 2.89k forks source link

a11y: Updates to the chat are not announced for screen reader users #3570

Open derekjackson-das opened 1 month ago

derekjackson-das commented 1 month ago

Issue description: When using a screen reader and submitting a query or message there is no indication the chat thread has been updated.

WCAG Criteria: SC 1.3.1 Info and Relationships and 4.1.3: Status Messages

Note: The announcement could be handled in several ways. In discussions the best solution was to announce messages and message details in an live region that acts as a status/message queue to notify users about the status of the chat (e.g. "LLM typing", "Claude says...", "waiting for reply") and keep focus on the input field so users can refine their query or move their focus to the chat thread if they would like to interact further with the message.

danny-avila commented 1 month ago

Made progress on this, still working on announcements for the different message content blocks that OpenAI Assistants, and soon, LibreChat Agents, will generate:

Screen Reader speech logs:

The AI is generating a response. 
Glad you think so! 
What’s on your mind? 
The AI has finished generating a response. 
t
e
l
l
space
m
e
space
a
b
o
u
t
space
t
h
e
space
2
0
2
4
space
o
l
y
m
p
i
c
s

The AI is generating a response. 
The 2024 Summer Olympics, officially known as the Games of the XXXIII Olympiad, are scheduled to take place in Paris, France, from July 26 to August 11, 2024. 
This will be the third time that Paris has hosted the Olympics, having previously hosted in 1900 and 1924. 
Here are some key points about the 2024 Olympics: 
1. 
**Venues**: The events will be held in various iconic locations across Paris and surrounding areas. 
Notable venues will include the Stade Roland Garros for tennis, the Stade de France for athletics, and the Seine River for open-water swimming and triathlon events. 
2. 
**Sustainability**: The Paris 2024 Olympics aims to be one of the most sustainable Olympic Games in history. 
Plans include using existing venues, reducing the use of temporary structures, and prioritizing eco-friendly practices. 
3. 
**New Sports**: The Paris 2024 Games will feature new sports and events, including breaking (breakdancing), which is set to make its Olympic debut. 
4. 
**Cultural Events**: The Olympics will be accompanied by a rich cultural program, showcasing French art, music, and culture to visitors from around the world. 
5. 
**Opening Ceremony**: The opening ceremony will be unique, taking place along the Seine River rather than in a traditional stadium. 
It is expected to be one of the largest public events ever held, with thousands of spectators lining the riverbanks. 
6. 
**Ticketing and Participation**: Ticket sales and participation details have been highly anticipated, with a focus on making the Games accessible to a wider audience. 
If you have any specific questions or topics you're interested in regarding the 2024 Olympics, feel free to ask! 
The AI has finished generating a response. 
derekjackson-das commented 4 weeks ago

I have tested in NVDA with Chrome and VoiceOVer in Safari and the announcements are not being consistently made and all of the information is not announced. The opening and closing announcement about the AI response is probably the most consistently read in both environments.

NVDA and Chrome This screen shot shows the chat thread and the corresponding chat log from NVDA. the content of the LLM reply is often truncated or just not announced. Screenshot 2024-08-16 at 3 11 04 PM Looking at what is happening a little closer it seems like there are too many changes in live regions occurring almost simultaneously on top of one another. In this clip the Accessibility Insights Inspector is watching for live region events and it isn't showing every update to the live regions in the DOM as you see them happen in the inspector window. It is also showing the strucutre changes in the log, but each one of those changes does not have a corresponding liveregion event. NVDA and Accessibility Insights Screen recording.

VoiceOver Safari/Chrome The same thing seems to be happening with VoiceOver. In safari it isn't announcing all of the updates. VO screen recording

I added a mutation observer to Chrome to watch the updates and see how they are being announced and you can see the updates are coming in one after the other and the accessibility API or the accessibility tree just can't keep up. On the other hand with very short annoucements they aren't even getting caught by the observer. I suspect they are getting added and removed so quickly that these short replies don't even register. Chrome mutation observer screen recording

I think one issue is there are too many live regions on the page all registering live events in competition with one another. It looks like there are four live regions, two assertive and two polite on the page. We should try reducing it to one live region or one assertive and one polite region. This is a good description of the issue of multiple live regions: Limit the number of live regions on the page. The Scott O'hara post she refers to is another good reference.

The other issue is that the messages are getting cleared out too quickly to register them as a change. The mutation observer seems to confirm that since the times the single word is not read by the screen reader all coincide with the times the mutation observer does not see them, assuming the text is being added to the live region. This is another excerpt about waiting to remove messages and loading new ones from Sara Soueidan, Empty the live region and wait a bit in between updates

I haven't been using the NerdeRegion she refers to but I am going to see how well that works. It might be easier than the hacks I was using in these videos.

danny-avila commented 4 weeks ago

Thanks @derekjackson-das for your thorough testing and research.

It's a good thing we are testing with groq as it helps handle that edge case where the generation is streamed extremely fast.

I just tried something that seems to impose a delay correctly, for the screen reader to catch up. Also it helps to process larger chunks at a time, so I made that change as well.

On the other hand with very short annoucements they aren't even getting caught by the observer.

I see the issue for this now, and corrected this. I somehow missed this by only testing longer prompts.

With the changes I've made so far, it is announcing all of your examples from your screenshot correctly, but needs a bit more testing, plus:

We should try reducing it to one live region or one assertive and one polite region. This is a good description of the issue of multiple live regions: Limit the number of live regions on the page. The Scott O'hara post she refers to is another good reference.

This is my next course of action, I think I can get it down to 2 live regions.

danny-avila commented 3 weeks ago

Here's what I generated from changes made in https://github.com/danny-avila/LibreChat/pull/3693:

Screen Recording (with audio)

Test Chat screenshot

Simple Numbered Response Example

Relevant logs from NVDA:

The AI is generating a response. 
One. 
The AI has finished generating a response. 
<!--- next turn --->
The AI is generating a response. 
The word "nuts" is spelled: N-U-T-S... Just like counting! 
The AI has finished generating a response. 
<!--- next turn --->
The AI is generating a response. 
There are 365 days in a non-leap year and 366 days in a leap year. 
But since you asked for a specific number, I'll give you the answer you can count on: Three Hundred Sixty-Five... 
no, just kidding! Seriously, it's either 365 or 366, depending on whether it's a leap year! 
The AI has finished generating a response. 
<!--- next turn --->
The AI is generating a response. 
That would be Montpelier! 
The AI has finished generating a response. 
<!--- next turn --->
The AI is generating a response. 
Here are 5 popular trees: 1. Oak 2. Palm 3. Pine 4. 
Maple 5. Apple 
The AI has finished generating a response. 
<!--- next turn --->
The AI is generating a response. 
A great question about a great city! Here are 5 popular attractions in Boston: 
1. Freedom Trail 2. Fenway Park (home of the Red Sox) 
3. Museum of Fine Arts 4. Faneuil Hall Marketplace 
5. The Institute of Contemporary Art (ICA) Let me know if you'd like more info! 
The AI has finished generating a response.