RevoluPowered / one-voip-godot-4

One voip plugin to rule them all
MIT License
61 stars 4 forks source link

Allow jitter buffer to be implemented in GDScript #22

Open goatchurchprime opened 4 months ago

goatchurchprime commented 4 months ago

I think the current use of a jitter buffer obfuscates what should be a fairly transparent process that would allow for synchronicity with animations, events and visemes.

Capturing: If you expose the VOIPInputCapture::_sample_buf_to_packet() function to GDScript then we can use a base AudioEffectCapture object and feed the VOIPInput object that wraps the Opus library like so:

while voipinputcapture.get_frames_available() >= 441:
    var samples = voipinputcapture.get_buffer(441)
    var opuspacket = voipinputcapture._sample_buf_to_packet(samples)
    transmit(opuspacket) # or packets.append(opuspacket)

For a bonus, _sample_buf_to_packet() could Repacketize multiple opus frames into a single opus packet when the length of samples is a multiple of 441. (By the way, 441 audio frames is 10ms of sound at 44100Hz, which is resampled up to 480 audio frames at 48000Hz required by the Opus library.)

The reason the current design of send_test_packets -> emit signal packet_ready is unsatisfying is that we lose track of which packet corresponds to which time window since it's going out to a different callback function instead of returning to the caller.

The output Opus stream stutters for the very simple reason that there is a mismatch between _process() frame rate (60fps) and audio encoding rate (100fps).

Playback

I exposed a copy of the push_packet() function that returned the uncompressed samples: PackedVector2Array AudioStreamVOIP::spush_packet(const PackedByteArray& packet)

The GDScript code for managing and playing the incoming stream is:

setup:

var staticvoipaudiostream = ClassDB.instantiate("AudioStreamVOIP")
$AudioStreamPlayer.stream = ClassDB.instantiate("AudioStreamGenerator")
$AudioStreamPlayer.play()
var playbackthing = $AudioStreamPlayer.get_stream_playback()

processing:

opuspacketsbuffer = [ ... ]  # filled from the networkthat
while playbackthing.get_frames_available() > 441 and len(opuspacketsbuffer):
    var frames = staticvoipaudiostream.spush_packet(opuspacketsbuffer.pop(0))
    playbackthing.push_buffer(frames)

Ther docs warn that AudioStreamGeneratorPlayback.push_buffer() is slow in GDScript, so the implementation of AudioStreamVOIP is probably good, if we also have the a get_opus_frames_available() to tell us how many empty slots are left in the jitter buffer.

It would also be a good idea to know how many frames are left in the buffer so we don't get the "NOT ENOUGH SAMPLES - frames:" error. However I can't see a good function in the Godot libraries to base it on, other than float _get_playback_position().

Additionally, features like Forward Error Encoding (where it can fill in for missing opus packets) have to be managed outside of this library since the opus packets don't contain sequence numbers, so you have to tell the library when a packet is missing.

There's also a DTX (Discontinuous Transmission) feature that puts out 400ms long frames when there is silence. Otherwise the Opus library assumes that everyone is talking all the time everywhere like it's a video conference where the only purpose of being online is to talk. This is not how we play networked games, where there something other than just talking as an activity, and a VOX (Voice operated push to talk) system would be more appropriate, as well as being kinder on the bandwidth, and considerably more scalable.

marc-weber1 commented 4 months ago

I'm worried about this making the interface a lot more complicated than it needs to be - send_test_packets is a temporary function that i was going to replace at some point with a voip handling thread (almost) completely hidden from the user

I am not against extensibility, but something as low level as swapping out the jitter buffer could probably require a recompile, while I was hoping the actual godot interface would just be the most frictionless way for someone with an online FPS to get people talking to each other without having to figure out any of this

marc-weber1 commented 4 months ago

just to make it easier for me to understand:

imagine if all test packets were sent by a thread automatically that checked for new packets dependent on the sample rate, e.g. 100 times per second, and all you have to do is tell the voip instance which connection and channel to send the packets over

Imagine if the test packets are received by an ok sounding jitter buffer like discord's in case there's packet loss for a few seconds

What use case do custom gdscript jitter buffers solve that the above design doesn't? When would someone want this?