dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Bring System.Speech to .Net Core or add some alternative #30991

Closed kolappannathan closed 4 years ago

kolappannathan commented 5 years ago

System.Speech API is not available in .Net Core or .Net standard. Currently, there are no alternatives for synthesizing audio locally in these frameworks. Kindly bring an API for this.

I am trying to migrate a project from .Net Framework to .Net Core but this is preventing it.

danmoseley commented 5 years ago

@preethikurup @terrajobst have we already reached out to owners of System.Speech? I don't recall working with that team.

terrajobst commented 5 years ago

@preethikurup @terrajobst have we already reached out to owners of System.Speech? I don't recall working with that team.

Yes, they are now part of Azure and there is a new offering. System.Speech is not evolved anymore.

terrajobst commented 5 years ago

We don't plan on bringing this .NET Framework API to .NET Core. See this announcement for details.

kolappannathan commented 5 years ago

@terrajobst Does .Net core has viable alternative option for local speech synthesis?

duaneking commented 4 years ago

I'm also blocked by not having this.

This System.Speech namespace is VITAL to the visually disabled community, as vital as System.Console is for everybody else. Not having this has been a HUGE DEAL.

We NEED a viable, LOCAL speech synthesis API. An Azure Service that has to be used over the network would not work for the disabled community and WILL NOT WORK.

birbilis commented 4 years ago

Maybe could use Microsoft.Speech functionality instead if System.Speech is not available

kolappannathan commented 4 years ago

@birbilis There is no such package as Microsoft.Speech. Did you mean Microsoft.CognitiveServices.Speech? If yes, that package will not work without connecting to Azure.

birbilis commented 4 years ago

@kolappannathan there used to be a Microsoft.Speech one too, check my code at https://github.com/Zoomicon/SpeechLib/blob/master/SpeechLib.Recognition/SpeechRecognition.cs

danmoseley commented 4 years ago

Please move discussion to the open issue https://github.com/dotnet/wpf/issues/2935

kolappannathan commented 4 years ago

@birbilis Wow. I searched in Nuget.org and couldn't find the said package. Is it still available?

blakepell commented 4 years ago

The old SDK worked well and most importantly worked locally. The new one is about getting people to subscribe to it on Azure, which isn't that bad other than you have to be online for it it work. It's a shame this can't be done locally anymore.

ststeiger commented 4 years ago

I've ported it, if you need it: https://github.com/ststeiger/VoiceRecognition/tree/master/System.Speech

However, the quality is borderline crap, I have no idea how you can get support for a particular language, and it only works on Windows. (the speech sdk needs to be installed). For some reason, more languages are available in the full .net framework version.

You better look for a more modern speech-sdk, such as from Facebook (wav2letter++), Baidu (deepSpeech2), Kaldi, Julius, CMUSphinx, Mozilla (DeepSpeech).

Google also has an excellent API for that, but it ain't particularly cheap. .

ocdtrekkie commented 4 years ago

This is really unfortunate to run into, and it ends my interest in .NET Core/.NET 5. Without local speech, the platform is useless to me.

ststeiger commented 4 years ago

@ocdtrekkie: You could use kaldi-gstreamer-server , then you can do it from .NET via web-sockets: https://github.com/alumae/kaldi-gstreamer-server

duaneking commented 3 years ago

Anything that is a server or client that goes over the network or a network interface does not meet basic accessibility requirements here as the network increases lag, costs, etc and is a burden on the user.

Many who are blind don't even have easy access to the internet, as a computer with the accessibility tools required is often out of their price range and open source options are limited and actively fought against by the big companies that make the most money from selling themselves via insurance claims.

Also, a braille terminal is EXPENSIVE; most people who need them have to use insurance to buy them, because they can cost thousands of dollars depending on the model and most of the time people who want them don't have that money.

.Net Core NEEDS System.Speech.* and a Console.Speak(string text); standard to be inclusive and support the disabled. An Azure server or service is directly at odds with the accessibility requirements here and will not work,

duaneking commented 3 years ago

However, the quality is borderline crap

This is common for speech synthesis; it always sounds bad to the community because a natural sounding voice requires vox sampling and other work that is not trivial. That's why Siri/Cortana/Google etc all use a sacrificial real persons voice as the base phonetic mappings for the retail public offering that sighted people get access to. But thankfully, the voice itself is a separate problem, so we don't need to worry about that here.

I have no idea how you can get support for a particular language

Register a voice for that language, that supports that language's phonetics. Again, that's a voice construction issue and not a part of this request, technically, since MS already has voices we can use to get this functional as a MVP.on windows.

The plugin for Linux/Mac support could just pass in a flite or festival based plugin for the v1. All it has to do is pass the language and text to be spoken and let the system handle that async.

and it only works on Windows.

That's easy to fix if the code is portable. Otherwise you need a public interface to call a sub-method that checks the current config/platform and the-current platforms installed voices, matches that platform and language to a voice, then calls that voice with the input text to synth the final speech. The method developer would need to worry about may have a signature like:

ResultEnum speak(Context context, string text);

.. where Platform and language used are in the context object and so the platform that is not supported could just return an enum value of ResultEnum.NOT_SUPPORTED_YET to show that if the engine being delegated to can not support it.

The speech engine could easily do this,. To provide support does not require full support for multiple platforms, it just needs windows support and a good architectural design that allows devs to add support to via code extension of a base object leveraging a common and public-enough interface. This could just be a plugin architecture and it would work.

For some reason, more languages are available in the full .net framework version.

Again, Voices are based on languages; so your going to expect to use a different voice based on your language preferences.

Carlos487 commented 3 years ago

Adding support to the Microsoft.Windows.Compatibility would be great at least to port existing applications in Windows.

lindomar-marques55 commented 3 years ago

Speech and speech recognition systems are the fashion of the moment. unfortunately microsoft saw a great chance to make money there. that's why so many updates eliminating options for apis that existed in windows and the system.speech and system.SpeechSynthesizer were eliminated because with them running they were an obstacle for microsoft to sell azure. the worst thing is that the programmers accepted this manipulation and have encouraged the use of azure. however programmers who are unable to acquire licenses on azure, or who have difficulties maintaining a stable connection to use this service, find themselves in a dead end, this explains why microsoft wants to eliminate the .net framework as it kills the most of these old apis that do not generate income for microsoft. very sad. I have used these apis in a local offline application for communities that do not have access to the network in real time, and I am concerned by the moment when microsoft ends support for the .net framework.

lindomar-marques55 commented 3 years ago

I have used these apis in a local offline application for communities that do not have access to the network in real time, and I am concerned by the moment when microsoft ends support for the .net framework.

ststeiger commented 3 years ago

One cool feature is, you can use the text-to-speech engine integrated in Google Chrome from JavaScript. That way, you can do it right in JavaScript. No server-backend required, no server roundtrip necessary.

speechSynthesis.getVoices().forEach(function(voice) {
  console.log(voice.name, voice.default ? voice.default :'');
});

var msg = new SpeechSynthesisUtterance();
var voices = window.speechSynthesis.getVoices();
msg.voice = voices[10]; 
msg.volume = 1; // From 0 to 1
msg.rate = 1; // From 0.1 to 10
msg.pitch = 1; // From 0 to 2
msg.text = "Como estas Joel";
msg.lang = 'es';
speechSynthesis.speak(msg);

https://dev.to/asaoluelijah/text-to-speech-in-3-lines-of-javascript-b8h

lindomar-marques55 commented 3 years ago

Me desculpe porem nao conheço nada ainda de javascript

Enviado do Outlookhttp://aka.ms/weboutlook


De: Stefan Steiger notifications@github.com Enviado: quarta-feira, 25 de novembro de 2020 13:05 Para: dotnet/runtime runtime@noreply.github.com Cc: lindomar-marques55 lindomar55@live.com; Comment comment@noreply.github.com Assunto: Re: [dotnet/runtime] Bring System.Speech to .Net Core or add some alternative (#30991)

One cool feature is, you can use the text-to-speech engine integrated in Google Chrome from JavaScript. That way, you can do it right in JavaScript. No server-backend required, no server roundtrip necessary.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/dotnet/runtime/issues/30991#issuecomment-733761862, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMDTQVN37D6PWTEJJLSVHQLSRUML7ANCNFSM4MLDBSHA.

kolappannathan commented 3 years ago

Update from linked issue:

Hello everyone. Thank you for your patience, and apologies for being silent for a little while. You've made it really clear there's a lot of demand for this and we have recently been working to make this open source: we got the last approvals we needed today and I have pushed up a PR just now. When that goes through, we can push up a NuGet package and I will ask you folks to confirm for me that it works for you. As you know, this is a Windows-only legacy speech tech that will not receive new investment or features: we're porting it so that all the existing code and users you've told us about above can continue to work on .NET Core/.NET 5, Powershell Core etc.

birbilis commented 3 years ago

That looks promising, might need "unlocking" the class hierarchy or injecting some interface that classes implement to make it extendable by others with implementations for other platforms (remember various classic .NET APIs had classes you couldn't inherit etc. and later on they where made to implement interfaces which allows for better extensibility without breaking old stuff). Don't remember the API though.

From what I remember I had tried to abstract the basic Speech Synthesis and Recognition functionality at https://github.com/Zoomicon/SpeechLib to cover both Microsoft.Speech and System.Speech that had close but somewhat different APIs, so one could use that as a starting point if they want to make some cross-platform abstraction with pluggable speech I/O engines (was also using MEF there so that I could consume it from https://github.com/Zoomicon/TrackingCam app that worked with pluggable functionalities).

danmoseley commented 3 years ago

System.Speech was ported to .NET Core. On Nuget in https://www.nuget.org/packages/System.Speech or get it via updating the Windows Compatibility Pack package reference.

danmoseley commented 3 years ago

As a .NET Standard 2.0 library, this will work on all supported versions of .NET Core (ie back to 2.1)