System.Speech.Synthesis.SpeechSynthesizer not implemented in core?

jasonliaocn commented 4 years ago

I'm trying to migrate some of the Wpf application from .net framework to .net core, then I cannot find a replacement of the System.Speech.Synthesis.SpeechSynthesizer which exists in the System.Speech.dll. And the msdn docs says not supported in .net core. Is this removed or how can we use the speech function in Windows 10 without Azure Speech Services?

Dotnet-GitSync-Bot commented 4 years ago

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

danmoseley commented 4 years ago

@AlexGhiondea is this an Azure service that your team tracks?

jkotas commented 4 years ago

dotnet/runtime#30991

danmoseley commented 4 years ago

Thank you, I had somehow misread. @jasonliaocn does that answer your question? Have you considered the Azure service - is it an option?

duaneking commented 4 years ago

I'm also blocked by not having this.

This System.Speech namespace is VITAL to the visually disabled community, as vital as System.Console is for everybody else. Not having this has been a HUGE DEAL as an Azure service is NOT AN OPTION.

The community NEEDS a LOCAL running system that can create this speech. Nothing non-local will work.

terrajobst commented 4 years ago

Having an option that doesn't require online access makes sense.

Is cross-platform a requirement though? System.Speech is a wrapper around Windows COM APIs and thus isn't cross-platform, even if we were to port it to .NET Core.

duaneking commented 4 years ago

I think it could be.

However, I think the System.Speech API interface is really was needed here; And not just Speech.Say as we need both ends so we can get input and output via speech if we do the work.

I think if it supports Win10 and is open source that will be enough to get the community involved.'

danmoseley commented 4 years ago

Moved to WPF since that is where the expertise lies for this API.

birbilis commented 4 years ago

From what I remember from when I was developing SpeechLib (https://github.com/Zoomicon/SpeechLib) there was a separate Microsoft.Speech and a System.Speech namespace with similar (but not exactly the same/compatible) functionality. I think the former one was promoted for Kinect.

If you see the homepage of that wrapper lib (that was trying to hide any differences in those APIs), it points to some projects that show how to use it (SpeechTurtle [simple speech-based turtle graphics] or the more complex one TrackingCam [tracking a presenter]) or forked/expanded (aka Hotspotizer [gesture [Kinect] and speech recognition to simulate keypresses]). Those three projects effectively also serve as use-cases for speech synthesis and recognition (via defined command dictionaries, not free speech recognition, although one could define some streaming text API for that one too I guess - not sure if the Microsoft and System speech APIs had support though for continuous speech recognition, although one could check if Azure Cognitive APIs define some API for that)

birbilis commented 4 years ago

@kolappannathan - moving discussion (sorry, long post, hope some of the links are useful) from https://github.com/dotnet/runtime/issues/30991#issuecomment-619694114 here too as requested

From what I remember Microsoft.Speech was in Microsoft Speech Platform (probably also related to older MS Speech Server that had become Office Live Communications Server) that must have been intended for use in Telephony-based services on servers I think (e.g. recognizers may had been finetuned for such scenaria) and was also shipped with Kinect v1 for Windows (maybe v2 too) installers if I remember correctly.

That should be the reason I was supporting both at that SpeechLib library (I remember setting up recognition dictionaries had some differences but I could abstract them more or less).

From comments in the SpeechLib code, I think System.Speech.Recognition and Microsoft.Speech.Recognition were both working on Windows (and with the KinectRecognizer too) but Microsoft.Speech.Synthesis wasn't working on Windows (probably the Kinect installer didn't bother to install the Runtime Languages for speech synthesis, see link below). See the includes and comments at

Speaking of the comments at the SpeechRecognitionKinectV1 (descendent class from my SpeechRecognition one), I see pointer to https://web.archive.org/web/20160202041952/http://kin-educate.blogspot.gr/2012/06/speech-recognition-for-kinect-easy-way.html (the original URL isn't available) and if you see that code it also uses the Microsoft.Speech namespace

Links:

on MS Speech Server: https://en.wikipedia.org/wiki/Microsoft_Speech_Server
on MS Speech Platform: Runtime: https://www.microsoft.com/en-us/download/details.aspx?id=27225, SDK: https://www.microsoft.com/en-us/download/details.aspx?id=27226, https://www.voiceelements.com/docs/programmable-voice/speech-recognition/install-microsoft-speech-platform/, Runtime Languages: https://www.microsoft.com/en-us/download/details.aspx?id=27224, Server-side runtime: https://www.microsoft.com/en-us/download/details.aspx?id=24974, retired article on Speech Technologies https://docs.microsoft.com/en-us/previous-versions/office/developer/speech-technologies/hh323806(v=office.14)?redirectedfrom=MSDN, A nice app that uses various MS Speech SDK from what I understand from discussion on reallusion forums (iClone and other face animation apps must be supporting speech synthesis and/or speech recognition too via SAPI for doing lip syncing) http://www.cross-plus-a.com/balabolka.htm, Link on SAPI: https://en.wikipedia.org/wiki/Microsoft_Speech_API (think SAPI had replaced older syntax)
first and last post at these three articles of mine: https://zoomicon.wordpress.com/tag/speech/

birbilis commented 4 years ago

BTW, I know there was a trend to move all similar services to the cloud, but currently there's also a reverse trend to move them into at least IoT devices (Edge computing), why not back to one's computer/notebook/phone too? What a dev needs is abstractions with pluggable implementations so that they don't get bothered on implementation details of specific service chosen and/or can switch services on the fly based on network connectivity, available power (battery) and CPU and available space in client device, functionality provided by the client device OS or hardware etc.

birbilis commented 4 years ago

I know the thread is on SpeechSynthesis, but since SpeechRecognition (verbal commands of predefined syntax, not freespeech recognition are very useful in app control and accessibility) is under the Speech namespace (both Microsoft and System one), these may be useful too:

in the SpeechTurtle app (voice controlled turtle graphics) I contruct a Grammar programmatically: https://github.com/Zoomicon/SpeechTurtle/blob/master/Grammars/SpeechGrammar_en.cs
in the extended version of Hotspotizer (keypress generator app based on Body Gestures and Speech) I load SRGS grammar files: see code at https://github.com/birbilis/Hotspotizer/blob/master/Hotspotizer.WPF/MainWindow.SpeechUtils.cs and https://github.com/birbilis/Hotspotizer/tree/master/Hotspotizer.WPF/Grammars/SRGS for the .xml files for the grammars (they can be loaded and unloaded to have better recognition at specific UI contexts during the app runtime or merged / loaded together too if needed)
the code in TrackingCam (an app with various technologies to follow - via robotic cam or zooming in on a wider image etc. - and get commands from a speaker walking back and forth on a podium) is similar to that in Hotspotizer, but a bit more evolved and maybe not easier to follow (using MEF - Microsoft Extensibility Framework - to load plugins in that app and fallback from Kinect speech recognition if not available to Windows speech recognition), see https://github.com/Zoomicon/TrackingCam/blob/master/TrackingCam/MainWindow.Speech.cs and https://github.com/Zoomicon/TrackingCam/tree/master/TrackingCam/Grammars

duaneking commented 4 years ago

I consider both speech synth and speech speech recognition to be issues here, both are needed equally to be standard parts of .net/.net core.

This is not about just speaking. This is about making an app usable using only speech with no sight required. It asks you questions. You answer. It does things. The goal is to not need your eyes at all for what could otherwise be console apps.

If MSFT is true to its stated values of inclusiveness, this should be an easy thing.

This System.Speech namespace is VITAL to the visually disabled community, as vital as System.Console is for everybody else. Not having this has been a HUGE DEAL as it locks people out. Right now, the API's effectively show a preference for sighted people and I feel like that's a missed opportunity for inclusion, to say it as lightly as I can.

TylerGubala commented 4 years ago

All accessibility functions should be available in all frameworks. Otherwise developers will either be shackled to one framework or will be forced to make tightly coupled implementations for themselves, like this one I found that calls powershell just to get the speech service.

I think it's possible to do better in this area. Azure should not be the end-all-be-all IMO. Sometimes my internet goes out.

fredm73 commented 4 years ago

I'd like to add my voice: I have worked with blind people in various countries to bring chess to them (called "chessSpeak"). I'd like to convert it to Core 3.1 (Windows desktop only). I looked at an Azure solution, but that is not really viable for free software.

victorvhpg commented 4 years ago

All accessibility functions should be available in all frameworks. Otherwise developers will either be shackled to one framework or will be forced to make tightly coupled implementations for themselves, like this one I found that calls powershell just to get the speech service.

I think it's possible to do better in this area. Azure should not be the end-all-be-all IMO. Sometimes my internet goes out.

Yes i agree. We need a local/offline solution like System.Speech

coderb commented 3 years ago

+1 for local speech api on windows

ocdtrekkie commented 3 years ago

This basically writes off any interest I have in moving into .NET Core/.NET 5, and that's pretty disappointing. Cloud isn't a viable answer.

neodon commented 3 years ago

I'm disappointed there isn't some alternative local speech synthesis and recognition solution in .NET Core. Don't get me wrong - .NET Core is incredible and this is just a small piece. But it's in the 80% of the most important things we need, especially to support those in our community with vision and hearing challenges.

duaneking commented 3 years ago

Anything that is a server or client that goes over the network or a network interface does not meet basic accessibility requirements here as the network increases lag, costs, etc and is a burden on the user.

Many who are blind don't even have easy access to the internet, as a computer with the accessibility tools required is often out of their price range and open source options are limited and actively fought against by the big companies that make the most money from selling themselves via insurance claims.

Also, a braille terminal is EXPENSIVE; most people who need them have to use insurance to buy them, because they can cost thousands of dollars depending on the model and most of the time people who want them don't have that money.

.Net Core NEEDS System.Speech.* and a Console.Speak(string text); standard to be inclusive and support the disabled. An Azure server or service is directly at odds with the accessibility requirements here and will not work,

Carlos487 commented 3 years ago

Adding support to the Microsoft.Windows.Compatibility would be great at least to port existing applications in Windows.

duaneking commented 3 years ago

Microsoft publicly states its committed to accessibility at https://www.microsoft.com/en-us/accessibility yet something seems to be stopping the company from aligning as One Microsoft in order to support Diversity and Inclusion and Accessibility in this way; May I ask what that is?

lindomar-marques55 commented 3 years ago

no matter how much they try to promote azure, for some programmers azure is definitely out of the question, as in my case and killing system.speech and system.SpeechSynthesizer will bring many difficulties for programmers with few resources or for programs to be used in communities without real-time network access

danmoseley commented 3 years ago

Hello everyone. Thank you for your patience, and apologies for being silent for a little while. You've made it really clear there's a lot of demand for this and we have recently been working to make this open source: we got the last approvals we needed today and I have pushed up a PR just now. When that goes through, we can push up a NuGet package and I will ask you folks to confirm for me that it works for you. As you know, this is a Windows-only legacy speech tech that will not receive new investment or features: we're porting it so that all the existing code and users you've told us about above can continue to work on .NET Core/.NET 5, Powershell Core etc.

cc @terrajobst @fabiant3

ocdtrekkie commented 3 years ago

@danmosemsft That's fantastic to hear. It means I can see a path forward again for migrating to .NET 5!

fredm73 commented 3 years ago

My compliments to Microsoft. I have long admired this company and am confirmed in my expectations.

On Thu, Dec 10, 2020 at 9:59 PM Jacob Weisz notifications@github.com wrote:

@danmosemsft https://github.com/danmosemsft That's fantastic to hear. It means I can see a path forward again for migrating to .NET 5!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dotnet/wpf/issues/2935#issuecomment-742933660, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWYLVGJSBSC3F42LVT5G5DSUGDI7ANCNFSM4MQKLXOQ .

-- Fred Mellender Rochester, NY https://sites.google.com/site/fredm/

duaneking commented 3 years ago

The code drop: https://github.com/dotnet/runtime/pull/45941

If this is the entire full stack, does that mean we can also recreate voices?

ocdtrekkie commented 3 years ago

@duaneking They are open sourcing the System.Speech bindings to call the Windows Speech API. They aren't open sourcing the Windows speech components.

duaneking commented 3 years ago

Thank you again for releasing this publicly.

How are we supposed to debug/rebuild/etc voices and voice recognition if we don't have the tools because they were no released as part of this?

Without the code for the rest of the system, I'm not going to be able to port it all to the other platforms as I had planned. So I'm appreciative but I also feel like MS isn't being fully supportive, and it feels like a bait and switch, respectfully,

ocdtrekkie commented 3 years ago

This is mostly just intended to let Windows apps that rely on the Windows Speech API to switch over to .NET Core. But you still need to be on a system that has Windows' speech platform to use it.

(I'd love a good cross-platform on-device speech library, that's just not what's happening here.)

terrajobst commented 3 years ago

@duaneking

See this comment above where I asked whether cross-platform was a requirement. System.Speech is a .NET wrapper over Win32/COM APIs; so we're open sourcing what we have; we don't have plans to build a speech service from the ground up. We (the .NET team) do not have any plans to invest in this tech; as @ocdtrekkie said: this is for porting existing code. New code should look into Azure Speech service.

birbilis commented 3 years ago

Cross-platform apps that want to work in disconnected mode can wrap this for Windows and wrap some other engine on Linux etc. For example SpeechLib (https://github.com/Zoomicon/SpeechLib) was wrapping both Microsoft.Speech and System.Speech. Could use that as basis to wrap more engines. Could even detect connected mode and use (similarly wrapped as a pluggable engine) the Azure Speech service (though that can end up costing too much or eat up any free credits one has I guess so not much of an option for free and FOSS apps)

duaneking commented 3 years ago

@terrajobst Respectfully,. that doesn't work for the community.

Any investment in voice on Azure or needing a network to work is antithetical to the needs of the community.

I had hoped all the code would be made available so that the community could port it as needed as open source.

terrajobst commented 3 years ago

I had hoped all the code would be made available so that the community could port it as needed as open source.

I’m sorry, I thought that was clear when I said that System.Speech simply calls Windows APIs. I should have made it more clear that the best we can do is release System.Speech itself, not the underlying OS implementation.

duaneking commented 3 years ago

On Linux and MacOSX, festival and flite might be a simple plug-in option.

Dotnet-GitSync-Bot commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

danmoseley commented 3 years ago

Gonna transfer this back to where the code now is.

danmoseley commented 3 years ago

Note this hasn't shipped yet. https://github.com/dotnet/runtime/pull/45941 needs to complete, and it needs shiproom approval, then I'm aiming to get it out in the Feb patch Tuesday if we can.

danmoseley commented 3 years ago

This is on track to go out Feb 9th both standalone and as part of the Windows compat pack.

danmoseley commented 3 years ago

Hello everyone, this shipped today. The Nuget package is System.Speech or get it via updating the Windows Compatibility Pack package reference.

Could folks please post back here to confirm it works successfully for them? I'd appreciate that.

danmoseley commented 3 years ago

As a .NET Standard 2.0 library, this will work on all supported versions of .NET Core (ie back to 2.1)

duaneking commented 3 years ago

Is platform compatibility a goal for this release?

danmoseley commented 3 years ago

@duaneking could you clarify ? If you mean, is it a goal that code written against the .NET Framework library works against this one, yes. I do not know of cases it would not. However the main focus is the core synthesis/recognition capability, which was led to the original asks, and not on the broader range of scenarios supported by the API. That would influence whether we make any fix.

ocdtrekkie commented 3 years ago

@duaneking As previously discussed, this release is porting the Windows-only System.Speech API to work on .NET Core. It calls components built into Windows, and is not available outside of it.

danmoseley commented 3 years ago

@ocdtrekkie thanks, I understand the question now. Correct, there is no plan to make this work on any other OS. This is fundamentally a wrapper around OS functionality.

lukeb1961 commented 3 years ago

PowerShell 7.2.0-preview.3 Copyright (c) Microsoft Corporation.

https://aka.ms/powershell Type 'help' to get help.

PS C:\Users\LukeB> find-module System.Speech Find-Package: C:\program files\powershell\7-preview\Modules\PowerShellGet\PSModule.psm1:8879 Line | 8879 | PackageManagement\Find-Package @PSBoundParameters | Microsoft … | ~~~~~~~~~~~~~ | No match was found for the specified search criteria and module name 'System.Speech'. Try | Get-PSRepository to see all available registered module repositories.

PS C:\Users\LukeB> find-package System.Speech

Name Version Source Summary

System.Speech 5.0.0 nuGet.org v2 Provides types to perform speech synthesis and speech… System.Speech 5.0.0 nuGet.org Provides types to perform speech synthesis and speech…

PS C:\Users\LukeB> find-package System.Speech | install-package -Force Install-Package: Dependency loop detected for package 'System.Speech'. PS C:\Users\LukeB>

danmoseley commented 3 years ago

@lukeb1961 thanks for the report. Could you please open a fresh issue, and we'll take a look?

for some reason, I get this far, then it stops.

PowerShell 7.2.0-preview.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS C:\Windows\System32>  find-package system.speech

Name                           Version          Source           Summary
----                           -------          ------           -------
System.Speech                  5.0.0            nuget.org        Provides types to perform speech synthesis and speech…

PS C:\Windows\System32>  find-package system.speech | install-package -scope currentuser

The package(s) come(s) from a package source that is not marked as trusted.
Are you sure you want to install software from 'nuget.org'?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "N"): A

danmoseley commented 3 years ago

I'm not very knowledgeable with Powershell, but from internet search, it seems that -skipdependencies can avoid this hang. The following makes speech for me:

PowerShell 7.2.0-preview.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS C:\Windows\System32> cd \test
PS C:\test> find-package System.Speech | install-package -scope currentuser -skipdependencies -destination .
PS C:\test> $a = [System.Reflection.Assembly]::LoadFrom('C:\test\System.Speech.5.0.0\runtimes\win\lib\netcoreapp2.1\System.Speech.dll')
PS C:\test> $ss = [System.Speech.Synthesis.SpeechSynthesizer]::new()
PS C:\test> $ss.SetOutputToDefaultAudioDevice()
PS C:\test> $prompt = [System.Speech.Synthesis.Prompt]::new('hello world')
PS C:\test> $ss.Speak($prompt)
PS C:\test>

There's probably a more efficient way to do it, as I say I'm not very knowledgeable about Powershell, but this proves that Speech works on Powershell.

Can you confirm this works for you @lukeb1961 ?

lukeb1961 commented 3 years ago

yes, -SkipDependencies worked immediately.

danmoseley commented 3 years ago

Ok good. That might be worth reporting to the nuget repo.

dotnet / runtime

System.Speech.Synthesis.SpeechSynthesizer not implemented in core? #46730