breakfastquay / rubberband

Official mirror of Rubber Band Library, an audio time-stretching and pitch-shifting library.
http://breakfastquay.com/rubberband/
GNU General Public License v2.0
561 stars 89 forks source link

R3 Mid/Side processing for max possible mono compatibility #81

Closed atskler closed 1 year ago

atskler commented 1 year ago

Hi! I found that the new R3 realtime processing produces excellent results for stereo mixes, but often the stereo wideness is quite disturbed and because of phase issues the result is not mono compatible (the mono will sound something like an old 96-128kbps bad mp3) and mono compatibility is a must in broadcasting and really important for PA system, because in the first case the listener may have a mono system (most probably it will be her phone’s speakers), and in PA system the monitor way can be easily a mono sum.

After experimenting with R3 a bit I found that a simple M/S pre- and de-emphasis can greatly minimize the mentioned problems, provide mono compatibility and still give breathtakingly excellent results.

So my question is if a M/S pre- and de-emphasis option could implemented for R3 (realtime) for the API and for the CLI.

Thank you in advance,

Attila S.

cannam commented 1 year ago

Oof, that's interesting. And unfortunate.

In the R2 engine, the processing option OptionChannelsTogether, which is the same as the flag --centre-focus in the command-line tool, does just what you ask for - it surrounds normal processing with a mid-side decomposition and recomposition. That's exactly how the option works, nothing more and nothing less.

Unfortunately in R3 it doesn't do that any more, because a more subtle way of reducing the stereo expansion effect, at least when listening using stereo equipment, became available in the core algorithm for R3. So OptionChannelsTogether (and therefore --centre-focus) use that instead, and there is no mid-side option available any more in R3.

This looks like a significant problem.

atskler commented 1 year ago

--centre-focus mitigates not that much this problem (I can upload sound samples if you need them) as the M/S processing, and without R3 M/S still, what you mentioned, the R2 seems to be the best compromise - if the result is intended to be used in broadcasting or in PA (and if there is no way to do third party M/S pre- and de-emphasis).

cannam commented 1 year ago

I'll do some tests tomorrow and try to decide on a way forward. The main problem is not in the implementation (mid-side is simple enough and we were already doing it anyway in R2), but in deciding on the right way to update the API. It might depend on whether a user would ever want to use both centre-focus (R3-style) and mid-side at the same time. I'll try various combinations and see how they work out, but if you have an opinion about that aspect, do say so.

Either way, it should be possible to solve this quite quickly I hope.

cannam commented 1 year ago

Thank you for reporting this, by the way. It's very valuable feedback, that shows not only a problem with the library but also a significant limitation in our test procedure. I do appreciate it.

atskler commented 1 year ago

You are welcome. When (2004) I learned sound engineering on a technician level my teachers mentioned mono compatibility many times, because of pa systems, and AM radio will be mono, TV will be mono, FM will fall back to mono with low intensity of field, etc and many receiver and playback device, especially the portable ones, has just one speaker (we are in 2004 here) so the result really must be compatible and enjoyable, even spectacular with mono systems or it will not convince and not sell.

I tried R3 with centre-focus fed with M/S signal and it creates something like a 'cross talk' between mid and side channels, for example the side gets more low frequency content than originally has. I would say this combination should be avoided.

The standard R3 processing with M/S seems to be the closest and most faithful to the original.

cannam commented 1 year ago

I can upload sound samples if you need them

Actually it might be helpful to have one example to add to my test set - preferably the most extreme example you have found in normal use. No more than 30 seconds of source audio needed. Feel free to email me if you would prefer not to attach a link publicly.

(I can easily reproduce the general effect, but it would be good to know that I'm working with something that is about as bad as it seems to get.)

If your example can be used to illustrate the effect you mentioned in your last comment (about mid-side + centre-focus) that would also be useful.

atskler commented 1 year ago

@cannam I sent an email.

cannam commented 1 year ago

Thanks!

cannam commented 1 year ago

The branch midside3 now contains a fix which is in testing. If you (or others) have the opportunity to build and test from this branch, feedback would be appreciated.

This fix actually replaces the behaviour of the --centre-focus flag (and OptionChannelsTogether option) for R3 with an alternative based on mid-side, so in order to test the fix, you just need to run e.g. the command-line tool with --centre-focus and you should find the output is now mono-compatible.

When first working on R3 I was reluctant to use mid-side for this option because it wasn't producing a stable enough stereo image. But I find that it can be improved significantly by a simple adjustment to take advantage of the mid-side structure, namely, whenever a bin is chosen for a phase-reset in the side channel, reset it also in the mid channel because salience in the side channel means it is more likely to benefit from phase coherence in the stereo field.

With this in place, I think the results (in stereo and mono) are good enough to be used instead of the previous behaviour of OptionChannelsTogether - this avoids adding a whole new option and restores mono compatibility for this option as in R2. However because this means changing the output for an existing option, I want to be as confident as possible that it really works.

atskler commented 1 year ago

Dear Chris,

I built the midside3 branch and it sounds really good in stereo and its mono compatibility is excellent, as far as I can tell (I tested the realtime mode, only with --centre-focus).

Some other observation while testing:

I hope all this helps,

Attila

On Fri, Mar 17, 2023 at 2:52 PM Chris Cannam @.***> wrote:

The branch midside3 now contains a fix which is in testing. If you (or others) have the opportunity to build and test from this branch, feedback would be appreciated.

This fix actually replaces the behaviour of the --centre-focus flag (and OptionChannelsTogether option) for R3 with an alternative based on mid-side, so in order to test the fix, you just need to run e.g. the command-line tool with --centre-focus and you should find the output is now mono-compatible.

When first working on R3 I was reluctant to use mid-side for this option because it wasn't producing a stable enough stereo image. But I find that it can be improved significantly by a simple adjustment to take advantage of the mid-side structure, namely, whenever a bin is chosen for a phase-reset in the side channel, reset it also in the mid channel because salience in the side channel means it is more likely to benefit from phase coherence in the stereo field.

With this in place, I think the results (in stereo and mono) are good enough to be used instead of the previous behaviour of OptionChannelsTogether - this avoids adding a whole new option and restores mono compatibility for this option as in R2. However because this means changing the output for an existing option, I want to be as confident as possible that it really works.

— Reply to this email directly, view it on GitHub https://github.com/breakfastquay/rubberband/issues/81#issuecomment-1473878150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUKVKZ4F2N3ZRYUAUMNMQCLW4RUDXANCNFSM6AAAAAAV2YXXWE . You are receiving this because you authored the thread.Message ID: @.***>

cannam commented 1 year ago

Thanks! Yes, I will have to update the docs. The description for --centre-focus is as it is currently because the quality of this option was never truly satisfactory in R2. This new implementation for R3 is substantially better.

The --pitch-hq option has no effect if you are not actually doing pitch shifting, which I assume is why those commands produce the same result.

cannam commented 1 year ago

This fix is now in 3.2.0. Thanks again for the report and feedback!

atskler commented 1 year ago

This fix is now in 3.2.0. Thanks again for the report and feedback!

You are welcome. And thank you for the fix.