facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
7.95k stars 997 forks source link

Audio stretching causing "phasing" with Demucs 3 Model B #285

Open brubsby opened 2 years ago

brubsby commented 2 years ago

❓ Questions

Hi! Demucs 3 Model B has taken the remixing community I'm part of by surprise with its huge jump in quality over spleeter and izotope rx 9.

We would be using it as our SOTA by now if not for one problem we've experienced that we're baffled by. The majority of audio stretching algorithms bundled with digital audio workstations seem to introduce what we can only describe as "phasing" when the stems from the model are stretched individually.

Our current best guess is that the artifacts described in Pons et al. (2021), are interacting with stretching algorithms in some unforeseen way. But we're curious if you had any theories as to why this is happening or how we could possibly get around it. Also we just wanted to inform you of a real world quality metric that you might not have considered, and hope it might guide future research! I could imagine possibly using an industry standard stretching algorithm as a form of data augmentation could help eliminate this problem.

adefossez commented 2 years ago

hello @brubsby , this is an interesting issue. would you have some example before and after stretching ?

brubsby commented 2 years ago

Here's an example of switching out time stretching algorithms with a set of Demucs 3 Model B output stems. We have been using the website mvsep.com to split stems, which doesn't seem to document a version number or anything for how recently they grabbed the Demucs 3 Model B implementation, but we assume any differences would be negligible.

Resample in FL Studio is basically normal playback, i.e. no stretching algorithm recalculating the audio. Switching to other stretch methods (even if no bpm change happens) introduces phasing, and is especially pronounced when individual stems are set to a stretch mode while others are set to resample. But we believe there is still phasing even if all stems are set to the same stretch modes.

The main portion of this example that exhibits the problem first plays unstretched at 0:24, a particularly noisy impact. It seems that noise in songs, non-tonal risers, effects, or drums are where this problem is most noticeable.

(please excuse the starting image quality, I had to lower it to keep the audio high quality while getting the file under 10MB)

https://user-images.githubusercontent.com/57653502/151625358-bf3ecabd-e1cb-4738-b8ed-b4b3b2a00ee6.mp4

Documentation on the different FL Studio stretching modes shown in the video: https://www.image-line.com/fl-studio-learning/fl-studio-online-manual/html/chansettings_sampler.htm#:~:text=Mode%20(Stretch%20Method)

Dyslexicon commented 2 years ago

Hi guys, I've noticed this too, that any previously pitch-shifted audio, run through Demucs, comes out with more MP3-ish watery artifacts, and threshold dropouts in the stems; overall reduced clarity vs. the exact same audio that has not been pitch-shifted prior to being run through Demucs. The workaround is render your Demucs stems before doing any speed-correction or syncing work with the audio. It may not be an issue with Demucs; it may just be that pitch-shifting audio after digitizing is inherently destructive, and Demucs reveals what went previously unnoticed.

adefossez commented 2 years ago

do you get any effect if you have a single stem playing and you stretch it ? might be some bleeding between stems that is not handled in the same way and end up being phase shifted between the stems.

Dyslexicon commented 2 years ago

Hi, Stretching/squeezing audio that has already been run through Demucs does not result in audible artifacts/degradation, nor phasing when stems are recombined.

Audio that has been stretched/squeezed prior to being run through Demucs does result in lower quality stems - but again this is most likely not an issue with Demucs, it is more likely that squeezing/stretching and pitch-shifting is inherently destructive and Demucs is simply revealing signal degradation that otherwise would go un-noticed if the audio had not been split into stems.

I just wanted to point out that I have observed this phenomenon, for anyone else with a workflow involving syncing or pitch-shifting; it is advisable to perform time stretching/squeezing or pitch-shift alterations on already rendered stems, and not on the original audio prior to running through Demucs.

Thanks guys

adefossez commented 2 years ago

I believe @brubsby has an issue where the artifacts appears once stretching a separated track. It would be good to know if the original track had any processing before being fed to Demucs.

The model is actually trained with time stretching for data augmentation of its inputs using SoundTouch (https://www.surina.net/soundtouch/) but only within a limited regime (i.e. +- 3 semitones, +- 15% BPM).

brubsby commented 2 years ago

For my use case, there was zero stretching/processing done beforehand. Interesting to note that that data augmentation scheme was already implemented! (I guess I should've read the whole paper more carefully). Perhaps the difference between the open source SoundTouch algorithm and the proprietary stretching algorithms used by different DAWs may introduce an effect?

Since posting the original issue, we're more certain that it only happens when the stretching modes are different between stems. So it's less of an issue than we previously thought. However, I still believe the problem is interesting to study. It could possibly be due to different frequency bands of white noise being split into different stems, and then the different algorithms treating the bands of white noise differently? It almost introduces a comb-filter-esque effect.

adefossez commented 2 years ago

yes, definitely I think the issue is some of the leakage between sources that get spread out over all sources and if stretched in different ways will introduce phasing artifacts when mixed back together. I think this won't happen when mixing stems from different tracks for instance, or using the exact same stretching.

awesomer commented 2 years ago

@brubsby : if you don't mind me asking, do you have a link to the remix community in question?