Rantanen / node-mumble

Mumble client in Node.js
MIT License
155 stars 48 forks source link

Positional Audio #125

Closed vitordvr closed 2 years ago

vitordvr commented 2 years ago

I was taking a look at the wiki about positional audio, but I didn't find anything referring to Mumble InputStream is it possible somehow to send an audio to the mumble server with vector3 ?

Rantanen commented 2 years ago

Not in the provided API. The input stream is also a bit tricky abstraction for it. Mumble doesn't handle positional audio as some hypothetical multichannel audio; instead the audio is still encoded as mono/stereo stream. The positional information is then added as extra metadata on the audio packets.

The input stream eats PCM audio, splits this in audio frames and encodes it. These encoded frames are then combined into a Mumble audio packet based on packet size settings. That audio packet has optional field for the positional data.

The issue here is that the input stream doesn't expose details/API on packet boundaries so there is no way to tell when the packet changes and positional data could be sent. Associating positional data with each sample would be quite expensive.

On the MumbleConnection-level the packets are formed in the sendEncodedFrame function:

https://github.com/Rantanen/node-mumble/blob/3387e1232dfec897c1d4b34015b1bf129625b08d/lib/MumbleConnection.js#L386-L396

The protocol docs indicate the positional data should follow the frames: https://mumble-protocol.readthedocs.io/en/latest/voice_data.html#encoded-audio-data-packet

Writing three float coordinates after the frames on line 396 would take care of that (and ensuring the length info on line 390 takes that added data into account).

A possible API for this would have current position as a state on the stream the user would keep up to date. Then when the stream queues the frames on the connection by invoking connection,sendVoiceFrame it would also pass the position data as a parameter. sendVoiceFrame would then pass the parameter to sendEncodedFrame. This would avoid the stream from having to expose the packet boundaries or from the user having to pass the location data with every sample.

Alternative to exposing the location as a field on the stream object would be to support some in-stream metadata that could be written to the stream through normal write APIs and the stream would update its internal state based on those - writing the location data would then proceed as above. I don't see much point with this though.

Now unfortunately this project is not only not maintained anymore, but I also lost the use of my other hand few months ago so I'm not going to be able to do any maintenance currently. The above instructions should enable you to make the necessary changes if you need them.