Neos-Metaverse / NeosPublic

A public issue/wiki only repository for the NeosVR project
197 stars 9 forks source link

Voice to text input #85

Open sirkitree opened 6 years ago

sirkitree commented 6 years ago

I'd love to have an option on the keyboard to use my voice for text input. Typing on virtual keyboards is somewhat cumbersome and tedious. This would also be great for spawning text notes, and just talking at them, and have a text object to do something with.

Frooxius commented 6 years ago

I've had a few people ask for this as well. Do you have suggestion for a good API though? Someone recommended Google Speech API, but that tends to produce gibberish for my voice. Although I have pretty thick accent.

This might be a paid feature though (use Neos Credits or pro-subscription) if cloud API is used. I might also see if it's possible to use Windows' API somehow. I found a way to use their inking API (want to do recognition for that too).

sirkitree commented 6 years ago

Do you have suggestion for a good API though?

I've only ever messed with browser-based ones through Chrome and Firefox which have a fairly standardized API. I'm not sure if they use something underneath.

I did see a video about Google Assistant today from Google I/O that sounds pretty amazing, but I don't know if that is accessible from an API.

sirkitree commented 6 years ago

IBM Watson has a service that could be integrated. In fact, if that were to happen, it might pave the way to utilizing many of their other services.

Simulacron3 commented 6 years ago

I think voice input interface is the future. Speech to text opens up voice command interface as well. IBM has a demo app available on Viveport and probably elsewhere:

"What if you could use your voice to powerfully affect your environment in Virtual Reality? With this tech demo from IBM, showcasing the Watson Developer Cloud using our SDK for Unity, you can. Create, modify and destroy objects using only your voice in an immersive sandbox world. The purpose of this demo is to spark ideas with VR developers by showcasing how easy it is to build conversational interfaces in VR using two Watson services: Speech to Text and Conversation. If you’re interested in learning how, or simply rebuilding this demo yourself, just go to https://ibm.biz/watson_vr for a comprehensive how-to guide. "

Frooxius commented 6 years ago

@sirkitree Yeah, I'd definitely love to integrate more of the cognitive services in the future. Google and Azure offer more of these as well. But question is which one will be the best performing one.

I have tried the IBM Watson speech to text, but for my own voice it also tends to produce nonsensical transcriptions, for example "a powerful metaverse engine for the virtual reality" came out as "The new years but the performance of ascension for virtual reality."

It's probably much better for native speakers without an accent, but makes it hard for me to evaluate and test those services (plus limits the userbase for now).

Simulacron3 commented 6 years ago

"It's probably much better for native speakers without an accent, but makes it hard for me to evaluate and test those services (plus limits the userbase for now)."

I support the idea that core features should be as universal as possible, considering a worldwide user base, but this and some other thinking since yesterday made me wonder about a plug-in system. Does Neos have one? I think features like this would be good concepts for implementation as a plug-in. (I'm a total newbie here at this point, but an excited one.)

sirkitree commented 6 years ago

Came across this tonight: https://github.com/watson-developer-cloud/unity-sdk

Frooxius commented 5 years ago

Another solution that could be used: https://github.com/mozilla/DeepSpeech/blob/master/README.md

sirkitree commented 3 years ago

With AudioX in place, perhaps this could be picked up again?

Frooxius commented 3 years ago

I'm sorry, but it's mostly unrelated to AudioX. The main problem with this is integrating some good voice-to-text system/service/library, rather than being able to manipulate the audio data and there's a lot of things to build around it as well. AudioX doesn't really help there.

sirkitree commented 3 years ago

I've seen a few tools that are using the standard windows voice input system. I know Neos supports more than windows, but I wonder if it could be a start? Off hand I know that Modbox and Noda are using it and could get more details on it's use.

Frooxius commented 3 years ago

It's a possibility, but from what others have tried and my own experience the system built in Windows doesn't work too well compared to some services available online, so I'm not sure if it's worth the engineering effort. E.g. I've always had trouble getting it to work due to my accent, I know Anomalous was experimenting with the voice translation and was getting substantially worse results compared to the cloud service.

Swingly6061 commented 3 years ago

Yeah I use Windows' voice recognition for some voice commands and it is not great for general sentence input.

ProbablePrime commented 3 years ago

Earthmark opened #2809 which has some discussion on this. We're continuing conversation here on #85. But please ensure you've read Earthmark's issue for additional commentary.

Thanks!

Frooxius commented 3 years ago

Just to reiterate, this is mainly hinging on having some good Speech to Text API that we can integrate. There are a few choices there:

1) We use system API for this (like Windows Speech To Text). In my experience this produces really poor quality results, especially if you have an accent (like me, I never got it to produce something usable). 2) We find some free library that can do the processing locally. I'm not aware of any that produces good results, suggestions are welcome 3) We use cloud services, like Microsoft Azure Cognitive services. Those pay per hour of processed audio, meaning this would cost on our end the more users use it. We'd have to tie it into some Patreon perk or paid service as a result.