gexgd0419 / NaturalVoiceSAPIAdapter

Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
MIT License
52 stars 4 forks source link

With Microsoft Edge online voices, NVDA's continuous reading stops after the first sentence #8

Open amirsol81 opened 2 weeks ago

amirsol81 commented 2 weeks ago

Greetings, and thanks for your awesome efforts. The problem is that NVDA can't use Microsoft Edge online voices in continuous reading. As continuous reading starts with NVDA+Down, the first sentence is read, but reading stops after that. So, in effect, continuous reading can't be used in NVDA With Microsoft Edge online voices. I've tested NVDA 2024.2 Release Candidate and NVDA 2024.3 alphas. It doesn't affect the offline natural voices. Can something be done to take care of this issue?

DraganRatkovich commented 2 weeks ago

Hello,

I have the same problem and wanted to open it on GitHub, so @amirsol81 thanks for opening it.

@gexgd0419 Thank you for allowing us to use natural TTS voices in screen readers and other applications. Do you have any progress or answer on this issue?

gexgd0419 commented 6 days ago

NVDA inserts bookmarks into the text to be spoken. When a bookmark is reached during speaking, the TTS engine/voice will tell NVDA the name of the bookmark, so NVDA will know the current speaking progress.

This engine do support bookmarks. It can pass the bookmarks to the natural voices and notify the TTS client (NVDA) when a bookmark is reached, if this is supported by the natural voice you are using.

Local Narrator natural voices and Azure natural voices support bookmarks and other features. However, Microsoft Edge voices only support a very limited subset of features, such as volume/rate/pitch adjustment. Bookmarks, unfortunately, are not supported by Edge voices.

Since the Edge voice server will just close the connection immediately if there's any unsupported SSML tag, this engine will remove all unsupported tags before sending the SSML to the server. As a result, you can still hear the text being spoken, but all unsupported elements will be lost. (#2 is another example)

Seemingly NVDA relies on bookmarks to know when a part is completed so that it can continue speaking the next part. When using Edge online voices, text can still be spoken, but bookmarks aren't supported, so NVDA will never know when the current part is completed.

There is a possible solution though. Edge voices don't support bookmarks, but they do support word boundary events to tell the client when each word is currently being spoken. NVDA isn't using word boundary events unfortunately, but maybe my engine could be made to simulate bookmark events, based on the supported word boundary events.

gexgd0419 commented 5 days ago

maybe my engine could be made to simulate bookmark events, based on the supported word boundary events

This works! With simulated bookmark events, NVDA continuous reading can work correctly. Online voices can be slow, though.

The fix will be in the next release version.

amirsol81 commented 5 days ago

@gexgd0419 Mega thanks for your efforts, and really looking forward to the next release. So can this fix also help other related issues? For instance, if we use Edge online voices, pressing ALT+Tab or pressing Windows+M to reach the Desktop makes NVDA to just say, Desktop. NVDA doesn't read the focused item on the Desktop. I guess this is yet another bookmark-related issue with NVDA.

mariopercinic commented 15 hours ago

I can confirm the same behaviour with NVDA when it is on desktop, and also the same with continuous reading. Btw you said that it might slow down the whole reading process when online voices are being used. Is there any way to compare how slow that is, and could it be speed up somehow?