Closed psobot closed 2 years ago
Hi Peter - getLatency
reports the number of samples by which the (modified version of the) audio at the start of the input is delayed, in the output sample stream. That is the number of samples that must be clipped from the start of the output, in order for input and output to align.
(I regret using the word latency for this - it matches typical usage within DAWs for example, but in the wider world something like "output delay" would have been clearer.)
So this is not the same thing as the number of samples needed to cause processing. That value is reported by getSamplesRequired
and will typically be much higher than the latency for the first processing block - the value strictly depends on the starting parameters but in most cases it will be 2048, as you discovered.
To maintain a constant output stream with a constant input rate (when pitch-shifting only - obviously this is not generally practical when time-stretching) it's necessary to buffer up more of your input before presenting it, in order to smooth out the lumps. For a reference you might look at the bundled LADSPA/LV2 pitch-shifter plugin implementation, which also supports dynamic pitch changes and which handles pitch shifts of up to three octaves in either direction.
Thanks @cannam! That's quite helpful. You mention:
it's necessary to buffer up more of your input before presenting it
Just to confirm - would it be safe to set the size of this buffer to the initial value returned by getSamplesRequired
? I assume that we'll never require more samples to be buffered than that initial value, but that's just a guess on my part.
No, I don't think that would be enough - for example in your log above, after you've pushed the first 2048 + a block of 512, you still don't have 512 samples back to return to your caller.
What happens is that when the stretcher has enough to consume a single frequency-domain block - which is what providing getSamplesRequired
number of samples will trigger - then it will process that block, overlap-add the output, and return the part that will not be needed for the next overlap-add, i.e. one output step rather than the whole block. This step size depends on the ratios and also on the content of the audio (because of the way transients are handled), although of course if your time ratio is 1 it will quickly settle down so that you're getting the same amount as at the input, as you also found.
For caution I would probably suggest buffering up the initial 2048 plus another 2048.
(This is all really designed for a pull model in which each processing unit can request as many samples as it needs from its supplier - combine that with an internally varying processing rate and the fundamentally long block sizes used and things indeed become a bit trickier in a case like this one.)
Closing as resolved - let me know if this is not appropriate. Thanks!
Hi @cannam! Thanks for Rubber Band - big fan of the library.
I've recently helped integrate Rubber Band into Pedalboard, a Python audio effects library I maintain, but I'm having a bit of trouble with it. Namely, I'm using Rubber Band in real-time mode with threading disabled to fit into a plugin chain, and I find that with modest pitch shifts (1.25x), I'm getting audio dropouts at the start of processing.
For my test case, I'm using
setPitchShift(1.25)
and running Rubber Band with the following options:When passing in fixed-size blocks of 512 samples (44.1kHz, stereo) to
process
, followed by immediate calls toavailable
andretrieve
, I get the following sequence of log lines from my test harness:Only after supplying in 2,048 samples do I start to get output, despite the fact that the stretcher is reporting only 819 samples of latency. Also, once the output does start coming out, it comes out in patches for the first couple calls, which all report fewer samples available than provided until about 128ms of audio has been passed in. This results in audible dropouts at the start of any audio file that's been processed.
I feel like I must be missing something obvious here - am I using Rubber Band incorrectly? Is there a way to ensure that Rubber Band supplies a constant audio stream (without any dropouts) when feeding it a constant stream of input? (Thanks for your help!)