breakfastquay / rubberband

Official mirror of Rubber Band Library, an audio time-stretching and pitch-shifting library.
http://breakfastquay.com/rubberband/
GNU General Public License v2.0
561 stars 89 forks source link

Pitch-shifting in real-time mode seems to cause audio dropouts at the start of a stream #54

Closed psobot closed 2 years ago

psobot commented 2 years ago

Hi @cannam! Thanks for Rubber Band - big fan of the library.

I've recently helped integrate Rubber Band into Pedalboard, a Python audio effects library I maintain, but I'm having a bit of trouble with it. Namely, I'm using Rubber Band in real-time mode with threading disabled to fit into a plugin chain, and I find that with modest pitch shifts (1.25x), I'm getting audio dropouts at the start of processing.

For my test case, I'm using setPitchShift(1.25) and running Rubber Band with the following options:

RubberBandStretcher::OptionProcessRealTime 
RubberBandStretcher::OptionThreadingNever
RubberBandStretcher::OptionChannelsTogether
RubberBandStretcher::OptionPitchHighQuality

When passing in fixed-size blocks of 512 samples (44.1kHz, stereo) to process, followed by immediate calls to available and retrieve, I get the following sequence of log lines from my test harness:

Rubber Band getLatency() reports 819 samples of latency.

Pushed 512 samples into Rubber Band.
Pulled 0 samples out of Rubber Band (0 were available).
Pushed 512 samples into Rubber Band.
Pulled 0 samples out of Rubber Band (0 were available).
Pushed 512 samples into Rubber Band.
Pulled 0 samples out of Rubber Band (0 were available).
Pushed 512 samples into Rubber Band.
Pulled 117 samples out of Rubber Band (117 were available).
Pushed 512 samples into Rubber Band.
Pulled 215 samples out of Rubber Band (215 were available).
Pushed 512 samples into Rubber Band.
Pulled 213 samples out of Rubber Band (213 were available).
Pushed 512 samples into Rubber Band.
Pulled 213 samples out of Rubber Band (213 were available).
Pushed 512 samples into Rubber Band.
Pulled 213 samples out of Rubber Band (213 were available).
Pushed 512 samples into Rubber Band.
Pulled 212 samples out of Rubber Band (212 were available).
Pushed 512 samples into Rubber Band.
Pulled 224 samples out of Rubber Band (224 were available).
Pushed 512 samples into Rubber Band.
Pulled 435 samples out of Rubber Band (435 were available).
Pushed 512 samples into Rubber Band.
Pulled 512 samples out of Rubber Band (637 were available).
Pushed 512 samples into Rubber Band.
Pulled 512 samples out of Rubber Band (542 were available).
Pushed 512 samples into Rubber Band.
Pulled 512 samples out of Rubber Band (650 were available).
Pushed 512 samples into Rubber Band.
Pulled 512 samples out of Rubber Band (549 were available).
... (all subsequent log lines show at least 512 samples being supplied)

Only after supplying in 2,048 samples do I start to get output, despite the fact that the stretcher is reporting only 819 samples of latency. Also, once the output does start coming out, it comes out in patches for the first couple calls, which all report fewer samples available than provided until about 128ms of audio has been passed in. This results in audible dropouts at the start of any audio file that's been processed.

I feel like I must be missing something obvious here - am I using Rubber Band incorrectly? Is there a way to ensure that Rubber Band supplies a constant audio stream (without any dropouts) when feeding it a constant stream of input? (Thanks for your help!)

cannam commented 2 years ago

Hi Peter - getLatency reports the number of samples by which the (modified version of the) audio at the start of the input is delayed, in the output sample stream. That is the number of samples that must be clipped from the start of the output, in order for input and output to align.

(I regret using the word latency for this - it matches typical usage within DAWs for example, but in the wider world something like "output delay" would have been clearer.)

So this is not the same thing as the number of samples needed to cause processing. That value is reported by getSamplesRequired and will typically be much higher than the latency for the first processing block - the value strictly depends on the starting parameters but in most cases it will be 2048, as you discovered.

To maintain a constant output stream with a constant input rate (when pitch-shifting only - obviously this is not generally practical when time-stretching) it's necessary to buffer up more of your input before presenting it, in order to smooth out the lumps. For a reference you might look at the bundled LADSPA/LV2 pitch-shifter plugin implementation, which also supports dynamic pitch changes and which handles pitch shifts of up to three octaves in either direction.

psobot commented 2 years ago

Thanks @cannam! That's quite helpful. You mention:

it's necessary to buffer up more of your input before presenting it

Just to confirm - would it be safe to set the size of this buffer to the initial value returned by getSamplesRequired? I assume that we'll never require more samples to be buffered than that initial value, but that's just a guess on my part.

cannam commented 2 years ago

No, I don't think that would be enough - for example in your log above, after you've pushed the first 2048 + a block of 512, you still don't have 512 samples back to return to your caller.

What happens is that when the stretcher has enough to consume a single frequency-domain block - which is what providing getSamplesRequired number of samples will trigger - then it will process that block, overlap-add the output, and return the part that will not be needed for the next overlap-add, i.e. one output step rather than the whole block. This step size depends on the ratios and also on the content of the audio (because of the way transients are handled), although of course if your time ratio is 1 it will quickly settle down so that you're getting the same amount as at the input, as you also found.

For caution I would probably suggest buffering up the initial 2048 plus another 2048.

(This is all really designed for a pull model in which each processing unit can request as many samples as it needs from its supplier - combine that with an internally varying processing rate and the fundamentally long block sizes used and things indeed become a bit trickier in a case like this one.)

cannam commented 2 years ago

Closing as resolved - let me know if this is not appropriate. Thanks!