VoIP: streaming microphone input to other players?

DEF7 commented 6 years ago

I'm working on a multiplayer VR game in Godot and one of the must-have features when being in the presence of other players in a virtual world is being able to speak directly to one-another. I see that Godot still lacks microphone support, but it also lacks streaming any kind of audio over the network.

I've been mulling over what the best way this functionality should fit into Godot's existing audio and network interfaces. In many instances players could have an AudioStream associated with them, or perhaps a new type: a NetAudioStream, and then their physics body would have a child AudioStreamPlayer3D to emit their voice. There's also the possibility of players radioing their team or other players, which means non-spatial playback of a network-inbound audio stream.

Also, there's the problem of how devs want this to actually work. Should there be more fine-grained control? Should there even be any of this high-level stuff or should there just be microphone input, and audio compression capability which devs would manually transmit the output from across the network?

I'm still learning how Godot does things, so I'm a bit fuzzy as to what the best way this functionality should be exposed, but judging by what I've read in the docs so far it looks like it would fit right into the existing audio system really well, and then it's just a matter of how devs want players to communicate (i.e. in person, via radio, with their team only, with everybody, etc). I'm just wondering what would best support the largest variety of possibilities. Maybe I want to add a radio static effect or distortion over a player's incoming 'radio' voice stream, based on some raytrace I did that determined there's a bunch of stuff in the way causing interference.

Thanks!

DEF7 commented 6 years ago

Thoughts: invidual players' audio-in streams would be broadcast onto the network to the game server which would then be responsible for routing the streams where they need to go. This would operate on new NetworkAudioStream 'channels' which AudioStreamPlayer nodes would dial into in order to subscribe to them in order to receive anything from the game server.

A player's audio-in stream could be routed to one or more NetworkAudioStream channels simultaneously, permitting them to be radioing their team through an established team-radio NetAudioStream while also able to be overheard and eavesdropped on by anybody nearby their actual scene player node - which would have another AudioStreamPlayer2D/3D that's streaming from the NetAudioStream that represents that player's "voice" in the world that emanates from their character. If multiple player audio streams are trying to overwrite eachother on something like a shared team-radio NetAudioStream perhaps there could be an option to mix them together, or just have a first-come-first-serve so that it locks on to whoever sends audio first (i.e. pushes transmit button on their radio and whatnot).

hungrymonkey commented 6 years ago

I kinda already made a hacky implementation.

the thing that really annoys me that I need multiple network stack because the current implementation of enet doesnt really allow a way to manage multiple ports.

I solve the NetworkAudiostream by making it into a lockless buffer etc and making one per character

you should look into godot servers to handle the network on a seperate thread.

https://godotengine.org/article/why-does-godot-use-servers-and-rids

hungrymonkey commented 6 years ago

You have a choice of using captain proto or google protobuf for your VOIP protocol.

I am debating the merits of varint. The lead dev of captain proto who made protobuf said he regretted the varint.

https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html

DEF7 commented 6 years ago

I'm having a hard time following. Are you talking about working in and recompiling the engine, or a plugin? Or worse? Could you provide more details? It sounds like you've made things a bit more complicated than they need to be. LibOpus is already in the engine's codebase, so you can get the best possible speech compression without adding anything. Just tack compressed speech data to the end of all outgoing game state packets, and you eliminate the need for any extra ports, etc..

hungrymonkey commented 6 years ago

You can add your own VOIP protocol as a godot module.

The annoying part is that the network code is not designed to support two different protocols at the same time. Since the scenetree holds the network and takes control of the enet peer.

for opus, you just need to include the headers and opus should work. Opus only support 48000 sample rate and intervals of 120 frame size.

hungrymonkey commented 6 years ago

https://github.com/godotengine/godot/issues/13947

Aranir commented 6 years ago

Has there been any development on this? Is there any hope that Godot will get a voice chat plugin anytime soon?

hungrymonkey commented 6 years ago

@Aranir I kinda already wrote one, but it basically a mumble copy and i have to refactor it as a godot server.

I waiting for the mic

My main issue is that I cannot put multiple streams in the current godot network implementation so I must use multiple connections which is wasteful and crashy.

Aranir commented 6 years ago

With waiting for the mic you are referring to #19106 ?

For the multiple streams, is that a known limitation which will be fixed before 3.1?

hungrymonkey commented 6 years ago

@Aranir I can make a mic with SDL2. The mic is a non issue. The larger issue is that I cannot use the highlevel network implementation as is.

Aranir commented 6 years ago

@hungrymonkey are there any plans that this will be possible? I couldn't find anything on the roadmap...

Or will #18827 solve it?

hungrymonkey commented 6 years ago

@Aranir I already been playing around with my own VOIP for awhile now. The largest issue is that I need to refactor mine into a godot server and the current network implementation does allow me to split channels for my own use.

hungrymonkey commented 6 years ago

@Aranir I have to evaluate the change. The way the current networking work is that ScreneTree practically dequeues all network packets. I cannot use that behavior because I need packets to be queued in their own channel buffer which isnt the case right now.

hungrymonkey commented 6 years ago

I am in the process of refactoring my old code to make it work with the new multiplayerapi

https://github.com/hungrymonkey/godot/tree/up_voip

It doesnt work right now, I have to figure out if the packets are queue properly etc.

hungrymonkey commented 6 years ago

I asked reduz, he put this feature on the back burner until the microphone api is done.

marcelofg55 commented 5 years ago

@hungrymonkey Hi, what's the status of this? Have you checked it now that the microphone API is implemented?

hungrymonkey commented 5 years ago

@marcelofg55 I did check. I kinda like a more signal based approach. Oh well, I will have to do buffering in my tree. I kinda asked Reduz awhile ago and he basically said to wait awhile. I haven't really done much since then. I wouldn't mind refactoring up but I really do not know if anyone is interested at all.

I kinda need to figure out a nice way to associate N pid and N audiostreams.

fire commented 5 years ago

I got stuck trying to access the input from the record bus effect. Any suggestions? My first approach is to make the record bus effect expose a ring buffer, but I don't know.

hungrymonkey commented 5 years ago

@fire I just made audiostream a lockless ring buffer. I think this approach is less complicated since it ingrates with current godot design.

Vector<int32_t> buf = AudioDriver::get_singleton()->get_input_buffer();
unsigned int input_size = AudioDriver::get_singleton()->get_input_size();

fire commented 5 years ago

Well it's cool if it's possible to record a mic at a certain game location and mix in a bit of the ingame background, but that's a bit tricky.

hungrymonkey commented 5 years ago

Like what type of mix? If you just make audiostream a buffer, you can stuff into an positional audioplayer and the sound will output at a certain location.

hungrymonkey commented 5 years ago

@marcelofg55 Sure, i guess will learn a bit more. I probably need to register an irc nickname.

pgruenbacher commented 5 years ago

https://github.com/godotengine/godot/pull/19106 looks like mic api is settled for 3.1

seabassjh commented 5 years ago

@hungrymonkey how do you write to an audio stream?

hungrymonkey commented 5 years ago

@Seabass247 I wrote a guide https://godot.readthedocs.io/en/3.0/development/cpp/custom_audiostreams.html

Luzzotica commented 4 years ago

Hello! I am trying to create a simple VoIP for a game I am making, and I am not quite following your custom audio streams guide. I have thought about it a bit and here are my conclusions, please correct me if I am wrong:

We need to be able to compress the information from the mic and send it to all connected users in a stream-like way
All connected users need to be able to read this stream, decompress, and play it as sound as they receive it I have tested by sending larger chunks of speech data, but that doesn't work. The latency is far too high. I haven't tried compressing it yet, but I imagine that would speed things up. However, this does not fix the idea of "stream-like". I am not sure how to create an audio player that plays information as it is received... would it be equivalent to setting the stream to the new small sound clip and constantly telling it to play? I imagine this is what was mentioned previously by "writing" to an audio stream, but again, I am struggling to follow your guide.

Any help would greatly be appreciated! Or if you have a project I can jump on to help build so I don't have to build it from scratch, I would appreciate that as well. =)

Thanks!

hungrymonkey commented 4 years ago

[ ] @Luzzotica I asked Reduz about making a VOIP module, but he was not receptive at the time.

In order to create a VOIP module, you subclass AudioStream into a custom single producer-single consumer lockless buffer. When microphone data is created, you compress it with opus and sent it to as a generic internet packet to the other players in your custom defined protocol. When the packet is received, you match the packet data to the correct custom Audiostream.

You should refer to my Godot server guide to make the VOIP module on a separate thread. https://docs.godotengine.org/en/3.1/development/cpp/custom_godot_servers.html You can look at how mumble as an example of a VOIP protocol.

My largest annoyance was that Godot networking does not have the ability to connect to multiple ports with one peer class. You end up doing strange logic trying to match the game peer and VOIP peer.

stranker commented 4 years ago

Hi! I'm currently investigating about compress the audio input of the microphone to opus and sent that info to the network. I really don't know where to start. I know that I can get the input (or recording) from the micro using an AudioEffectRecord and that gave me an AudioStreamSample. But in order to compress that data where I need to do it?

Thanks in advance!

DEF7 commented 4 years ago

Opus is an audio codec. Codec = COmpression DECompression. Just pass the raw audio PCM data into Opus to get the compressed stream data that you then send off over the network. Just make sure that you're buffering enough received audio long enough to prevent any network transmission jitter/irregularity from causing gaps between received audio chunks. You'll always get a gap here and there no matter what, but try to get the gaps reduced for the most part by queuing up received chunks for playback at a consistent interval that's offset by a few hundred milliseconds after the initial audio chunk was received. There's already a delay incurred by the time spend recording the chunk on the sender's side (which is determined by size of your chunks, which is determined by interval you send out audio chunks) and then the network transmission latency too, plus the buffering delay you're adding ontop of that to smooth out the jitter. Lets say you're buffering for 150ms and you have the first chunk for a stream arriving at 29.300sec you'd play it at 29.450s. Say chunks are send out every 50ms, and are therefore 50ms long, you should be able to queue up at least 2 more chunks while having a 3rd one already playing that is the oldest in the queue. Ideally you'd receive the next two chunks at ~29.350 and ~29.400, and have a 4th chunk arriving right as you're playing back the Opus-decoded 1st chunk (decode on receive and queue up for playback). I'd create a "voicecast" container object for the queue whenever a player starts broadcasting audio, to keep track of timing and playback chunks. It is destroyed upon receipt of last chunk, which would include a "destroy" flag, or auto-destroy if a last chunk is never received for more than a few chunks' worth of delay. You could send a timestamp with each chunk and base your buffering delayed playback timing on that but it really doesn't have to be that complicated, just automatically assume all chunks should play at times derived from the initial chunk's arrival - and instantiation of the voicecast object. Also, keep track of player IDs responsible for the inbound audio chunks! You could just store your voicestream container object in your player objects to keep it organized.

You could initially just play chunks as they're received to get things up-and-running then go over everything, which will be poppy and glitchy sounding but it should work for the most part, and then do a 2nd pass on your code and add in the queuing/buffering. Or if you feel comfy enough to plan it out and do it all at once go ahead and do that. Anyway, hope something here helps, good luck!

On Wed, Jan 15, 2020 at 11:14 AM stranker notifications@github.com wrote:

Hi! I'm currently investigating about compress the audio input of the microphone to opus and sent that info to the network. I really don't know where to start. I know that I can get the input (or recording) from the micro using an AudioEffectRecord and that gave me an AudioStreamSample. But in order to compress that data where I need to do it?

Thanks in advance!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/godotengine/godot/issues/18133?email_source=notifications&email_token=ACYGB5LCG454A2NRPMEMIIDQ55OBRA5CNFSM4E2D5D6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBO6MQ#issuecomment-574811954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYGB5MYEVGBQOWXQW377WLQ55OBRANCNFSM4E2D5D6A .

stranker commented 4 years ago

Opus is an audio codec. Codec = COmpression DECompression. Just pass the raw audio PCM data into Opus to get the compressed stream data that you then send off over the network. Just make sure that you're buffering enough received audio long enough to prevent any network transmission jitter/irregularity from causing gaps between received audio chunks. You'll always get a gap here and there no matter what, but try to get the gaps reduced for the most part by queuing up received chunks for playback at a consistent interval that's offset by a few hundred milliseconds after the initial audio chunk was received. There's already a delay incurred by the time spend recording the chunk on the sender's side (which is determined by size of your chunks, which is determined by interval you send out audio chunks) and then the network transmission latency too, plus the buffering delay you're adding ontop of that to smooth out the jitter. Lets say you're buffering for 150ms and you have the first chunk for a stream arriving at 29.300sec you'd play it at 29.450s. Say chunks are send out every 50ms, and are therefore 50ms long, you should be able to queue up at least 2 more chunks while having a 3rd one already playing that is the oldest in the queue. Ideally you'd receive the next two chunks at ~29.350 and ~29.400, and have a 4th chunk arriving right as you're playing back the Opus-decoded 1st chunk (decode on receive and queue up for playback). I'd create a "voicecast" container object for the queue whenever a player starts broadcasting audio, to keep track of timing and playback chunks. It is destroyed upon receipt of last chunk, which would include a "destroy" flag, or auto-destroy if a last chunk is never received for more than a few chunks' worth of delay. You could send a timestamp with each chunk and base your buffering delayed playback timing on that but it really doesn't have to be that complicated, just automatically assume all chunks should play at times derived from the initial chunk's arrival - and instantiation of the voicecast object. Also, keep track of player IDs responsible for the inbound audio chunks! You could just store your voicestream container object in your player objects to keep it organized. You could initially just play chunks as they're received to get things up-and-running then go over everything, which will be poppy and glitchy sounding but it should work for the most part, and then do a 2nd pass on your code and add in the queuing/buffering. Or if you feel comfy enough to plan it out and do it all at once go ahead and do that. Anyway, hope something here helps, good luck! …

Thanks for your quick response! My main issue now is where I need to compress the raw data that I get from the AudioRecordEffect. I know that get_recording() retrieves a sample with all the data but it depends on the audio format (8bit, 16bit, IMA_ADPCM @not implemented). So my question is, I need to create a new FORMAT_OPUS on AudioStreamSample in order to compress that audio? Or I missing something?

Thanks in advance!

Wapit1 commented 4 years ago

Any progress? I am working on a project which would need voice chat (also a multiplayer VR game)

Calinou commented 4 years ago

@Wapit1 As far as I know, nobody is currently working on a VoIP implementation.

hungrymonkey commented 4 years ago

@Wapit1 Just a fair warning. I have experience with an annoyance with enet manages only one socket. When you load your VOIP Godot server, you will end up syncing between two different game ID. You migrate it by forking Godot Enet and make it open two different sockets

Wapit1 commented 4 years ago

@hungrymonkey how do you make a VOIP server ? any link to documentation from what you said it seems that creating 2 server (with different port) one handling the game and the other the voice chat

hungrymonkey commented 4 years ago

https://docs.godotengine.org/en/3.1/development/cpp/custom_godot_servers.html https://godot.readthedocs.io/en/3.0/development/cpp/custom_audiostreams.html You will need these two docs and combine them. You will also need to fork enet https://github.com/godotengine/godot/tree/master/modules/enet

Wapit1 commented 4 years ago

No one made a working Voice chat system before for Godot ? learning C++ is a bit out of my scope as I barely master GDscript how far did you got on your attempt at making it ?

hungrymonkey commented 4 years ago

@Wapit1 , I made a basic 1 channel VOIP work but I end up disliking my implementation because I was creating two network trees and syncing data between them. It made both the GDScript and the C++ code ugly.

Wapit1 commented 4 years ago

@hungrymonkey it is still better than no VOIP, Any plan to make the project open source ? pretty sure some Godot contributor could lend an hand

hungrymonkey commented 4 years ago

@Wapit1 I have to clean it up with the addition of the mic. I did not mind OSS the project at the time I made it but Reduz was not that interested a few years ago. I will need to find time to clean it up to make it remotely acceptable.

Wavesonics commented 4 years ago

Has anyone investigated wrapping an existing VOIP service using GDNative?

This looks like it has a pretty generous Free tier and fully cross-platform SDK: https://www.vivox.com/features

I might take some time and investigate this to see if it could be integrated with Godot.

nonchip commented 4 years ago

just gonna leave that here...

just sayin' probably would make sense to just interface the standardized non-proprietary cross platform open source voip service that's already being used anyway.

and wrapping the required libwebrtc / javascript functions doesn't seem too impossible of a task

bestdani commented 4 years ago

My quick approach for getting VoIP into my multiplayer project with spatial 3d audio output support for now is to use the godot python binding together with https://github.com/spatialaudio/python-sounddevice/ (which is some portaudio wrapper) and https://github.com/orion-labs/opuslib (thin wrapper around libopus)

A rather simple python class represented as a node in godot is responsible for getting raw microphone data from the portaudio wrapper and directly feeds it into libopus result in some chunk of encoded opus data.

This data can then just be used by some node with a gdscript on it and transfer this opus data to remote machines using a rpc_unreliable call (to keep latency low in my use case). On the remote machines the data is fed again into some node with a python script on it that internally calls libopus and fills a PoolRealArray, this data is then used to drive a AudioStreamGenerator.

This was just some prototype implementation but seems work good enough for now that I consider just keeping it like this and just to carry the python overhead with me instead of diving into how to do this with gdnative.

This is the actual essential gdscript code that's left with this approach to enable this voip functionality:

# Nodes with a python script attached that wrap around portaudio and libopus
onready var voip_input = $"voip/sound input"
onready var voip_decoder = $"voip/decoder"

# some AudioStreamPlayer3D with a AudioStreamGenerator attached to it
onready var output := $speaker0

func _process(delta):
    var encoded_audio = voip_input.get_data()
    if len(encoded_audio) > 0:
        rpc_unreliable("_new_audio_data", encoded_audio)

remotesync func _new_audio_data(encoded_audio: PoolByteArray):
    var result = voip_decoder.decode(encoded_audio)
    # push buffer might also work and theoretically even perform better
    for frame in result:
        output.get_stream_playback().push_frame(Vector2.ONE * frame)

From this approach I would conclude that it just takes some little changes to what's exposed via the API to gdscripts to allow implementing VoIP using just very simple gdscript code like this.

If one could for example just get encoded opus audio from any audio bus similar like the record effect works, sending these data chunks using some rpc call and pushing that into some AudioStreamOpus playback would be very simple to use imho.

*Replace opus with any other audio codec suitable for this approach or even allow sending raw audio data.

bestdani commented 4 years ago

On another note: I'm aware of this (https://github.com/cbarsugman/godot-voip-demo) voip demo project unfortunately this causes cracks and pops I guess caused by the involved (de)activation of the microphon effect recording and while testing this approach in some more sophisticated application I suffered from godot engine crashes under windows 10 which seemed to stem from the microphone effect recording in combination with short latency times - unfortunately I could not find out the exact point of failure and I don't have any crashing project files anymore (but maybe someone else can confirm these crashes?)

KoBeWi commented 4 years ago

We are moving proposals to the Godot proposals repository. There's already a proposal for this feature https://github.com/godotengine/godot-proposals/issues/870 and it had this issue linked. Any further discussion should be moved there.

godotengine / godot

VoIP: streaming microphone input to other players? #18133