breakfastquay / rubberband

Official mirror of Rubber Band Library, an audio time-stretching and pitch-shifting library.
http://breakfastquay.com/rubberband/
GNU General Public License v2.0
561 stars 89 forks source link

Calling process() with final=true and samples=0 doesn't flush samples from R3 #77

Closed andymstone closed 1 year ago

andymstone commented 1 year ago

I'm using Rubberband in real time mode. Previously using the R2 engine I would repeatedly call process(input, samples, false) with my audio data and once I reach the end of the data I'd call process again with zero input samples and final=true:

while (!eof) {
  // ...read from input stream...
  rb->process(input, samples, false);
  // ... read output from rubber band...
}
rb->process(input, 0, true);
// ... read last bit of output from rubber band...

Which with R2 effectively flushes any samples remaining internally via available() and retrieve(). Using the R3 engine this doesn't happen - calling process(input, 0, true) doesn't change the number of samples available at the output, I think because line 682 of R3Stretcher.cpp has while (inputIx < int(samples)) - and since samples is zero the block doesn't get executed and the internal samples are never made available (and the very end of the audio data is lost).

I don't know if this is an intentional behaviour change or if I was relying on undocumented behaviour previously. But calling process with zero samples was a very convenient way of flushing rubberband when it was difficult to determine in advance if any given buffer is the final buffer. It would be great if that behaviour could be made available in R3.

andymstone commented 1 year ago

On further investigation, if I provide the final flag to the last process call with the last bit of real data, with R3 the output seems to be padded at the end. If the last audio data that was provided to process did not fade to zero then I would expect the output to also not fade to zero, but it seems that with R3 it does. Trimming off 2*rb->getStartDelay() seems to get rid of the extra fade out at the end. Just to be clear: it seems to be extra samples appearing at the output, not that the expected output is being attenuated. If I chop off the extra samples I get the output I would have expected.

R2 didn't have the extra data at the end.

Possibly this is a separate issue?

cannam commented 1 year ago

Thanks for the report! These do sound like possibly two separate issues, but I'll investigate and I can always separate them out if appropriate.

cannam commented 1 year ago

The initial issue reported here is now fixed. The cause was pretty much where you pointed - in version 3.0 it actually still worked the way you expected, but a change for v3.1 in rev 021de9d5 to the way data was fed in removed the call to consume in cases where no data had been provided at input. I've fixed this and added a regression test for it.

The question about excess data being produced at output is a separate issue, and that's currently unchanged. The situation here is that, historically in R2 for example, the realtime mode has never guaranteed that it will produce any particular number of samples at output at all. It is dependent on the content as well as on processing parameters: you can get more samples than you expect, but also sometimes fewer (which is generally bad, but it's an artifact of the way R2 works that is hard to escape). It just waits for you to provide the final input block, then drains its internal buffers. Applications using realtime mode have generally been expected to keep track of how many samples they want to see and truncate the output themselves if necessary (the command-line tool does this for example - search main.cpp for the comment saying "in offline mode the stretcher handles this itself").

The situation with R3 is theoretically the same as R2, but there are two practical differences: (i) R3 is internally more predictable and less content-dependent, so in practice it does produce a more predictable number of spare samples at the end, and (ii) its internal buffers are longer and the process of draining them produces substantially more spare samples, including many that are actually always zero. So it would be both easier and more effective to trim the end for you automatically with R3. On the other hand, perhaps it's best not to encourage applications to expect that, since it isn't the case for R2 and that is not going to change. Also it would need a bit of analysis (and quite a lot of testing) to be sure that the correct amount was being trimmed and no more, especially in cases where the time and pitch ratio have changed recently. It would be seem more problematic for a caller to find that their output was suddenly too short and they had no way to recover the missing part, than to have to trim it themselves. So I am in two minds here. There is certainly a good case for stopping a little earlier, especially since (as I mentioned above) I believe the final samples are always zero - as apart from anything else, it takes CPU power to generate them. So I might well look at this again.

Anyway, the original issue is fixed!

andymstone commented 1 year ago

Thanks for fixing the original issue!

I agree you definitely don't want less data output than expected and if extra output samples is how it works then that's fine (I was just caught out because in R2 the extra samples weren't noticeable in my use case).

cannam commented 1 year ago

Fix is in 3.1.3 and 3.2.0.