godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.12k stars 69 forks source link

Add an AudioStreamPlayer property to adjust playback speed without affecting pitch #10574

Open goatchurchprime opened 2 weeks ago

goatchurchprime commented 2 weeks ago

Describe the project you are working on

A VOIP plugin for Godot https://github.com/goatchurchprime/two-voip-godot-4

Describe the problem or limitation you are having in your project

When an audio clip is longer than the time available to play it back in, the only options in Godot are either to clip the audio early, or change the pitch_scale by the ratio of the time durations so that it plays back faster. Unfortunately this changes the pitch, which usually sounds wrong.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

There is a standard technique of Audio time stretching outlined here: https://en.wikipedia.org/wiki/Audio_time_stretching_and_pitch_scaling

We could have another float value on the AudioStream called playback_rate that implemented this resampling at the rate given and gave the same pitch if you left the pitch_scale as 1.

These two values could interact so if you set pitch_scale=0.8; playback_rate=1.25 it would have the effect of lowering the pitch of the audio sample without changing its speed.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

There is a library that could do it here, though I haven't evaluated it: https://github.com/dbry/audio-stretch

Also, it appears that this feature is the default in the HTML5 spec, as described here: https://developer.mozilla.org/en-US/docs/Web/API/HTMLMediaElement/playbackRate

This resampling feature can be disabled with the preservePitch HTML5 setting (see the demo playback widget at the bottom of that page).

Godot could take advantage of the presence of this feature on the HTML5 platform if the Audio playback is handled as a special case.

If this enhancement will not be used often, can it be worked around with a few lines of script?

No. It would need to be a compiled AudioStream plugin, not as script.

Is there a reason why this should be core and not an add-on in the asset library?

This potentially has a wide application wherever audio samples need to be timed exactly. It should just be an option that's always available, just like the AudioStreamPlaybackResampled.mix(), which is already non-trivial in its implementation.

Calinou commented 2 weeks ago

There's an AudioEffectPitchShift resource you can use already on an audio bus. This changes the audio's pitch without changing its speed, so you can change pitch_scale on the AudioStreamPlayer to the inverse of the value used in AudioEffectPitchShift. For instance, set pitch_scale in the AudioStreamPlayer to 0.5 and set pitch_scale in AudioEffectPitchShift to 2.0.

This should allow for changing the playback speed without affecting the pitch. Quality will always suffer more or less when doing this (no stretching algorithm is perfect).

That said, I agree this makes sense to provide as an AudioStreamPlayer feature so we can use the web APIs on the web export (which is needed for sample-based playback to work).

goatchurchprime commented 2 weeks ago

I did not see that!

It does seem to work in my experiments. The quality is good with an FFT Size of 512 because for the purposes of speeding up speech you lower the pitch and speed up the playback. (For pitch shifting upwards you need a bigger FFT Size or it sounds awful.)

This feature is inconveniently attached to an AudioBus rather than an AudioStream. I think I can make it work if incoming voice streams that need to be sped up to catch-up are temporarily routed to a bus that has a fixed AudioEffectPitchShift value running, and then routed back to a normal AudioBus when they are caught up. I don't know how glitchy this will be, but the speedup cycle is a glitch anyway, so it might not matter. (Managing a different AudioBus for each incoming VOIP stream doesn't seem like a good idea.)

If it does have wider use (for subtly adjusting and synchronizing the end points of audio tracks), it feels like this capability belongs in the AudioStreamPlayback system from the user's perspective, so maybe leave this proposal open in case anyone else (as well as me) wants it.