facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation
MIT License
7.88k stars 983 forks source link

I get a padding error and I've tried reducing the audio length to a few seconds to no avail #600

Open MotorCityCobra opened 1 month ago

MotorCityCobra commented 1 month ago

🐛 Bug Report

I've tried to make a script as simple as possible to isolate vocals

To Reproduce

import torch
import torchaudio
from demucs.audio import AudioFile, save_audio
from demucs.apply import apply_model
from demucs.pretrained import get_model

def separate_vocals(track_path, output_path):
    # Load the pretrained model
    model = get_model('955717e8')
    model.eval()
    if torch.cuda.is_available():
        model.to('cuda')

    # Load audio
    audio = AudioFile(track_path).read(streams=0, samplerate=model.samplerate, channels=model.audio_channels)

    # Normalize audio
    mean = audio.mean(0, keepdim=True)
    std = audio.std(0, keepdim=True)
    audio = (audio - mean) / std

    # Apply the model
    with torch.no_grad():
        sources = apply_model(model, audio[None].cuda(), shifts=0)

    # Rescale back the output
    sources = sources * std.cuda() + mean.cuda()

    # Save only the vocals
    save_audio(sources[0][0], output_path, samplerate=model.samplerate)  # Assuming the first source is vocals

# Example usage
track_path = 'C:/Users/ooo/tor/rvc-data-prep/k_isolate/input/k_and_j.mp3'
output_path = 'C:/Users/ooo/tor/rvc-data-prep/k_isolate/output/vocals.wav'
separate_vocals(track_path, output_path)

Expected behavior

I expect vocals.wav to be audio output with just the vocals from the original audio. Or any file output

Actual Behavior

No file is output because I get this error...

(iso_vocals) C:\Users\ooo\tor\rvc-data-prep>python iso_simple.py
Traceback (most recent call last):
  File "C:\Users\ooo\tor\rvc-data-prep\iso_simple.py", line 35, in <module>
    separate_vocals(track_path, output_path)
  File "C:\Users\ooo\tor\rvc-data-prep\iso_simple.py", line 24, in separate_vocals
    sources = apply_model(model, audio[None].cuda(), shifts=0)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\demucs\apply.py", line 250, in apply_model
    chunk_out = future.result()
                ^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\demucs\utils.py", line 129, in result
    return self.func(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\demucs\apply.py", line 271, in apply_model
    out = model(padded_mix)
          ^^^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\demucs\htdemucs.py", line 538, in forward
    z = self._spec(mix)
        ^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\demucs\htdemucs.py", line 435, in _spec
    x = pad1d(x, (pad, pad + le * hl - x.shape[-1]), mode="reflect")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ooo\tor\iso_vocals\Lib\site-packages\demucs\hdemucs.py", line 39, in pad1d
    assert (out[..., padding_left: padding_left + length] == x0).all()
AssertionError

Your Environment

Pytorch on CUDA 12.4 with cuda.is_available() returning True The audio file is and mp3. 44100, 320, 2 channel

torch.version '2.4.0.dev20240521+cu124'

CarlGao4 commented 1 month ago

Please use torch < 2.2 The latest version you can use is 2.1.2

MotorCityCobra commented 1 month ago

My Torch version... I should have included in the original post


>>> import torch
>>> torch.__version__
'2.4.0.dev20240521+cu124'
>>>                   

Somehow I was able to get it to work with the older model only by calling the module from the commandline.

python -m demucs.separate -n mdx_extra_q c:/path/to/my/audio.mp3

But I think this is using the same version of torch. Has to be.