Large audio chunks - Githubissues

pawel-glomski commented 1 year ago

Hi there, thanks for the amazing library.

I am working on python bindings for this library, to use in my media-player project. In general, python is quite slow, and handling/processing small audio frames (~1024 samples) is way slower than using larger chunks. Thus, I usually pass 1-2 seconds of audio (usually 40-80k samples) to the process() function.

While with R2 engine everything works great, R3 seems to have some problems with processing, or rather returning blocks of audio bigger than 65536. Take a look at this simple example and its output:

#include <rubberband/RubberBandStretcher.h>

namespace rb = RubberBand;
using Option = rb::RubberBandStretcher::Option;

static constexpr size_t SAMPLE_RATE = 44100;
static constexpr size_t CHANNELS_NUM = 2;
static constexpr size_t FRAME_SIZE = 80000;
static constexpr size_t ITERATIONS = 25;

std::array<float*, CHANNELS_NUM> get_ptr_per_channel(std::vector<float>& audio)
{
  std::array<float*, CHANNELS_NUM> ptr_per_channel;
  size_t const samples_num = audio.size() / CHANNELS_NUM;

  for (size_t channel_idx = 0; channel_idx < CHANNELS_NUM; ++channel_idx)
  {
    ptr_per_channel[channel_idx] = audio.data() + channel_idx * samples_num;
  }
  return ptr_per_channel;
}

int main()
{
  rb::RubberBandStretcher stretcher(SAMPLE_RATE, CHANNELS_NUM, Option::OptionProcessRealTime | Option::OptionEngineFiner);
  stretcher.setMaxProcessSize(FRAME_SIZE);

  std::vector<float> silence_data(2 * CHANNELS_NUM * FRAME_SIZE, 0.f);
  std::vector<float> output_buffer;

  size_t overflow = 0;
  for (size_t i = 0; i < ITERATIONS; ++i)
  {
    stretcher.process(get_ptr_per_channel(silence_data).data(), FRAME_SIZE, i + 1 == ITERATIONS);
    overflow += FRAME_SIZE - 65536;

    if (stretcher.available() * CHANNELS_NUM > output_buffer.size())
    {
      printf("resize to available = %i\n", stretcher.available());
      output_buffer.resize(stretcher.available() * CHANNELS_NUM);
    }
    printf("overflow = %zu\n", overflow);
    stretcher.retrieve(get_ptr_per_channel(output_buffer).data(), stretcher.available());
  }
}

Outputs:

resize to available = 65536
overflow = 14464
overflow = 28928
overflow = 43392
overflow = 57856
overflow = 72320
RubberBand: R3Stretcher::process: WARNING: Forced to increase input buffer size. Either setMaxProcessSize was not properly called, process is being called repeatedly without retrieve, or an internal error has led to an incorrect resampler output calculation. Samples to write: 2688
RubberBand: R3Stretcher::process: old and new sizes: (84096, 168192)
overflow = 86784
overflow = 101248
overflow = 115712
overflow = 130176
overflow = 144640
overflow = 159104
RubberBand: R3Stretcher::process: WARNING: Forced to increase input buffer size. Either setMaxProcessSize was not properly called, process is being called repeatedly without retrieve, or an internal error has led to an incorrect resampler output calculation. Samples to write: 5376
RubberBand: R3Stretcher::process: old and new sizes: (168192, 336384)
overflow = 173568
overflow = 188032
overflow = 202496
overflow = 216960
overflow = 231424
overflow = 245888
overflow = 260352
overflow = 274816
overflow = 289280
overflow = 303744
overflow = 318208
overflow = 332672
RubberBand: R3Stretcher::process: WARNING: Forced to increase input buffer size. Either setMaxProcessSize was not properly called, process is being called repeatedly without retrieve, or an internal error has led to an incorrect resampler output calculation. Samples to write: 10752
RubberBand: R3Stretcher::process: old and new sizes: (336384, 672768)
overflow = 347136
overflow = 361600

Since available() returns at most 65536, I figured the remaining samples must be still in the stretcher, thus I manually compute the overflow here, as FRAME_SIZE - 65536. As you can see, the overflow value aligns with the numbers in the warnings. When FRAME_SIZE <= 65536, the warnings disappear.

Is this a bug or are there some limits to the size of the audio block that the process() function can handle?

I am using single/RubberBandSingle.cpp build on x64 Ubuntu 22.04.

cannam commented 1 year ago

Thanks for the report! I believe this comes down to a misunderstanding probably caused by poor documentation of the API, specifically with the expectation that each process call should have no more than one matching retrieve call.

In practice, while available returns the number of samples that will be returned by the next retrieve call, this is not necessarily the total number that the stretcher is capable of producing without more input - if the input was very long then there may be limitations caused by internal buffers (as you discovered) that mean the output has to be retrieved in more than one call.

So your inner process loop structure typically wants something more equivalent to while (stretcher->available() > 0) { /* retrieve... */ } rather than merely if, after each process call.

I've added a few words to the API doc to try to make this more apparent, and also added a unit test using longer block sizes in realtime mode.

pawel-glomski commented 1 year ago

Thanks for the fast response!

Actually the first thing I did, after seeing the warnings, was to print the result of available() after each retrieve() call, to see if maybe there is still something to retrieve, but it was 0 every time, and it still is.

I have looked at the tests, and I think they don't actually cover this case at all (or maybe I am still misunderstanding something :sweat_smile:).

https://github.com/breakfastquay/rubberband/blob/ffaef18a9dc9513ca0af87bcfb7fa4b34bd716e8/src/test/TestStretcher.cpp#L273-L276

As you can see, the "block size" is not really used for choosing the size of an audio block that will be passed in a SINGLE process() call, rather at most getSamplesRequired() samples will be passed.

The following comment explains what bs (block size) is used for in this function: https://github.com/breakfastquay/rubberband/blob/ffaef18a9dc9513ca0af87bcfb7fa4b34bd716e8/src/test/TestStretcher.cpp#L253-L260

Just to clarify, I am talking about passing a large block of audio in a single process() call, like it is described in the documentation of setMaxProcessSize():

Tell the stretcher the maximum number of sample frames that you will ever be passing in to a single process() call.

Also, the test case you added uses timeRatio=0.5. This won't cover my case too, even with bs used for choosing the input chunk size. With such configuration, passing 80k samples to the process() would result in around ~40k output samples. The issue I am experiencing only occurs when the resulting audio chunk is longer than 65536 samples. You would have to use bs > 150k or timeRatio>0.85, only then the tests fail.

Keep in mind that this problem does not occur with R2 engine.

Even if your solution would work, I find it kind of awkward to work with. is it really by design? If I don't know what is the actual number of available samples, I cannot allocate big enough array to fit it all.

cannam commented 1 year ago

Thanks for the follow up - good points all I think.

As regards the time ratio, I initially wrote it as 1.1 but then changed it to match some of the other tests. I've changed it back to a longer value now. But you're right, the test isn't testing the same thing as you reported at all, is it? I'll look into this again.

cannam commented 1 year ago

I've been back over this and have corrected the test and (I believe) fixed the issue. It took longer than expected as I also found a related problem to do with handling an oversized block when making the final process call to the R2 engine. Please test with the current repo code.

Even if your solution would work, I find it kind of awkward to work with. is it really by design?

Kind of. The truly expected behaviour is that you call getSamplesRequired and pass that number of samples to process. If you pass more, then it's theoretically possible you may have to retrieve more than once to get all of the output. But I think that is an edge case and it wasn't the cause of the issue here, I was clearly mistaken about that.

pawel-glomski commented 1 year ago

Great! Didn't expect the fix so fast, especially for such a niche use case :sweat_smile:. It seems that everything works now, thanks a lot!

breakfastquay / rubberband

Large audio chunks #86