foo86 / dcadec

DTS Coherent Acoustics decoder with support for HD extensions
117 stars 40 forks source link

How to downmix 5.1 into stereo properly? I got overflow even with the -3DB Matrix by Summing the individual output channels. #34

Open eviluess opened 9 years ago

eviluess commented 9 years ago

Hey! 1st if all, Thanks for your work on this. I tried to got a stereo downmix of dts files. However, I found the option -2 will not work if the dts file doesn't contain the downmix coffs. So I tried to sum the individual audio tracks with the following formula: Lo = FL + 0.707(C+SL); Ro = FR + 0.707(C+SR); This will cause the final stereo file distorted. I compared with ffmpeg. It seems that its downmix will cause the volume became very low (about -9 dB).

So how should I do the downmix properly on the dts files that are not containing coff infos?

Looking forward to your reply. Thanks!

MarcusJohnson91 commented 9 years ago

Are you sure you're not supposed to subtract 0.707 from the center and rear channels, before combining them into the output stereo channel?

eviluess commented 9 years ago

Do you mean the correct formula should be: Lo = FL - 0.707(C+SL); Ro = FR - 0.707(C+SR);

MarcusJohnson91 commented 9 years ago

I mean Lo = FL + (C - 0.707), + (SL - 0.707)

eviluess commented 9 years ago

Confused, I think this might cause the DC offset become -1.414

Nevcairiel commented 9 years ago

Subtraction is clearly the wrong way to go. You are supposed to multiply C by 0.707 in any case, since that compensates for cloning the (single) C channel into two channels. Most people also multiply the surrounds by that to avoid the surrounds interfering with the front sounds too much (ie. your original formula)

However, this formula can cause overflows if the original audio signal already is at full volume. The only way to reliably combat this is to reduce the overall volume to avoid clipping - this is what ffmpeg does. It will result in somewhat quieter track, but its the only way to absolutely be sure that it will never overflow/clip.

ghost commented 9 years ago

The only way to reliably combat this is to reduce the overall volume to avoid clipping - this is what ffmpeg does.

Actually, libswresample and libavresample have different defaults. libswresample doesn't do it AFAIK, which will sometimes result in clipping. I guess it boils down to the user's preference: get audio that's "too silent", or audio that might clip.

eviluess commented 9 years ago

So there's no way to make a established downmix result with non-suppressed gain and without any clip?

Nevcairiel commented 9 years ago

libswresample doesn't do it AFAIK, which will sometimes result in clipping.

swresample is a bit dumb. It does it when you use integer internal/output, but doesn't with float output, so yeah.

Nevcairiel commented 9 years ago

So there's no way to make a established downmix result with non-suppressed gain and without any clip?

Not without analyzing the audio first to get its real peak information. A general 1-pass operation has to assume that all channels can contain full range audio at the same time, and if you combine 1.0 + 0.707 + 0.707 it will overflow, so you have to reduce volume, ie. effectively dividing by 2.414 (which is about 7.5dB in reduction, iirc)

eviluess commented 9 years ago

That's why I started the discussion.

I found some players request 5.1 channel configuration to the sound card via the waveOutOpen API by filling the corresponding Channel Flags (0x3F) to dwChannelMask, and will not lead to any overflow even I turn its volume and the system volume to the max together. The sound is louder than playing the downmix generated by ffmpeg (-7dB) directly in the player.

It seems that the sound card can do the downmix correctly? The 1-pass peak scanning couldn't have any chance to be processed.

MarcoRavich commented 3 years ago

Hi there, sorry to revive this old discussion but I'm looking for someone able to implement correctly the "Novel 5.1 Downmix Algorithm with Improved Dialogue Intelligibility" research: ndmix

Similar to the state-of-the-art downmix methods, only 5 channels are taken into consideration: L, R, C, Ls and Rs. We can represent the downmix operation in the form of the following equation:

*lt [n] = l[n] + 0.707 c[n] + (dlev - 1) e[n] + 0.5 ls [n]** *rt [n] = r[n] + 0.707 c[n] + (dlev - 1) e[n] + 0.5 rs [n]**

where e[n] is the extracted voice signal, dlev represents the dialogue level and all considered signals are represented in the digital domain, in which n denotes the sample index.

Someone @ Hydrogenaudio forums implemented it in this way:

ffmpeg -i 6chan-input.wav -af "pan=stereo|FL < 1.0FL + 0.707FC + 0.707BL|FR < 1.0FR + 0.707FC + 0.707BR" -ac copy stereo.wav

...do you think is correct (and proper) ?

MarcusJohnson91 commented 3 years ago

@forart this project is very dead.

The decoder was moved to ffmpeg, talk to them.

MarcusJohnson91 commented 2 years ago

DCADec hasn’t been updated in like 4-5 years, it’s been merged into FFmpeg.

In FFmpeg you’re looking to remap the channels, search that.

On Feb 1, 2022, at 8:44 PM, damian101 @.***> wrote:

 I just use the same command I use for 6.1 and 7.1 too: -af 'lowpass=c=LFE:f=120,pan=stereo|FL=.3FL+.21FC+.3FLC+.3SL+.3BL+.21BC+.21LFE|FR=.3FR+.21FC+.3FRC+.3SR+.3BR+.21BC+.21LFE' Maybe not ideal loudness-wise for 5.1, but I usually normalize to -23 LUFS anyway.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.