breakfastquay / rubberband

Official mirror of Rubber Band Library, an audio time-stretching and pitch-shifting library.
http://breakfastquay.com/rubberband/
GNU General Public License v2.0
561 stars 89 forks source link

16Khz (or lower) input sample rate causes distorted output when using v3 engine #70

Closed rotemdan closed 1 year ago

rotemdan commented 1 year ago

I can reproduce both with the library and the standalone app (demo v3.0.1).

Once I resample the input to a higher sample rate like 22050, 24000, 48000 etc. the distortion is fixed.

Lower sample rates like 12000, 8000 also have the issue. Slightly higher ones like 17000 and 18000 don't.

cannam commented 1 year ago

Can confirm. Looking into this now.

cannam commented 1 year ago

Thanks for reporting this!

I've committed & pushed a change that should make it handle these rates better - please give it a try.

Both R2 and R3 engines are intentionally most optimised for both sound quality and efficiency at 44100 or 48000 Hz (there should be no noticeable difference between those two) and are best used at one of those rates unless there's some very compelling practical reason.

But they should both work sensibly at any audio rate, even if the output isn't optimal, and should not crash or sound obviously wrong. So if you still find cases in which the output is plainly nonsense, I'd like to hear about it.

I've also added a note about sample rate to the constructor's inline documentation.

rotemdan commented 1 year ago

Thanks for fixing.

Some of first test files I tried were 16Khz (mostly speech recognition training data) so I immediately noticed that the standalone app produced distorted output with the newer engine. At first I thought that was the intended output (i.e. the new engine was experimental / had known issues etc.), then I retried with upsampled versions of the same test files and realized this could possibly be an issue.

Since the standalone app is intended to be used by non-programmers I thought it would be important to ensure that they get valid output even if they use files with non-optimal sample-rates (they may not be aware of it or understand it). A simple workaround would have been to unconditionally resample the input to, say 48000Khz but you decided to investigate and fix the core of the underlying issue and that's highly appreciated!

I pulled the latest updates, recompiled and tried many random sample rates (some outside the range specified), from 2000 to 250000 (I'm using the Speex resampler) and they sounded as expected. I haven't seen any issue so far.