facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
8.12k stars 1.02k forks source link

Stems have different loudness #224

Open lucellent opened 2 years ago

lucellent commented 2 years ago

The new demucs v3 is awesome. It really is a big improvement over the last one and it's definitely the best open-source tool right now.

There's one thing that I noticed however, some stems have different loudness levels than others, so in the end, when you combine all 4 stems, the song is not the same as the original output.

The difference varies, but I found out generally the drums are quieter than they should be, around 2-3dB quieter. The vocals too, but no by much.

I figured this out by phase inverting all 4 stems with the original song and trying to match the volume levels until I hear nothing. I don't know if this is a bug, or that's how demucs works, but is there a possible fix? I'm running songs with 0.15 overlap, and when I tried 0.00 overlap it seemed like it might the drums specifically louder, but I think still not enough (I don't know if a lower overlap value means better or worse for the stems)

adefossez commented 2 years ago

Thank you for reporting this issue. It is possible that the quieter signal is due to some part of the signal being dropped. With the previous iteration of Demucs I rememeber the sum of all stems was very close to the fully signal, however it is possible this is no longer the case and this should be investigated.

Sadly, on the short term, I am unlikely to provide an automated fix, and you would have to adjust the volume yourself.

Higher overlap is usually better (but also slower to evaluate). zero overlap will result in some discontinuities every 44 seconds.

awesomer commented 2 years ago

@lucellent : Could you share the process whereby you adjust the levels to match the original? Matching the original loudness when all of the stems are layered back together matters a lot to certain of my use cases and any advice on how to do so would be greatly appreciated. I am also curious as to any findings you have as to how much overlap might help minimize this issue; these cases are a minority of my use cases and I'll happily trade GPU compute time for better accuracy in this regard.

lucellent commented 2 years ago

@awesomer it's not really complicated, you load all the stems + original song in software like Adobe Audition, or Audacity, and invert the original track (invert the phase). When you playback everything now (all of the stems on separate track + original inverted song), you'll hear if there's any loudness mismatch.

For example, if the drums are supposed to be louder, you'll hear drums. Then it's just a matter of manually increasing the volume slowly until you hear silence (that means you've matched the volume).

Hope that explains it?

awesomer commented 2 years ago

FTR, I tried the above technique, but it seems that not only do tracks have different volume as a whole, but they also have different volumes throughout their duration, so that if one part of the track is corrected via the above means, other parts are de-corrected.

lucellent commented 2 years ago

Are you sure? I've been correcting the volume of the stems for every track I've done and I don't think I've encountered this issue. You will actually hear some noise if all the tracks are at their correct levels and you've inverted them with the original song, it's not supposed to be completely silent. But I'd still love if this issue wasn't there in the first place (doesn't happen all the time, but most of the time)

awesomer commented 2 years ago

It certainly seemed that way when I tried to invert "Topdown" by Channel Tres. Specifically the vocal channel I was unable to adjust, it would cancel out for one part but then be significantly audible for other parts. I would be curious if you had the same result with that song as input.

CarlGao4 commented 2 years ago

Can you upload some of your files so it will be more convenient for us to find the bug.

awesomer commented 2 years ago

@CarlGao4 : My example file here, where the vocal channel seems to become different volumes throughout, is - https://www.dropbox.com/s/9zcgvbtn7kyyy0x/4.%20Channel%20Tres%20-%20Topdown.flac?dl=0