deezer / spleeter

Deezer source separation library including pretrained models.
https://research.deezer.com/projects/spleeter.html
MIT License
25.77k stars 2.82k forks source link

[Bug] Spleeter returns previous prediction when running more than once #822

Open MrBanhBao opened 1 year ago

MrBanhBao commented 1 year ago

Description

Spleeter returns the prediction of the previous waveform when running separate method with the same separator object more than once.

e.g: file1 --separate--> prediction of file1 file2 --separate--> prediction of file1 file3 --separate--> prediction of file2

I am using spleeter version 2.3.2, installed with pip in a conda environment with python 3.8.15.

Step to reproduce

I wrote a small script which should illustrate and reproduce the bug:

from spleeter.separator import Separator
from spleeter.audio.adapter import AudioAdapter
import sounddevice as sd

path1 = '../../Downloads/Doja Cat - Say So (Official Video) (152kbit_Opus).opus'
path2 = '../../Downloads/Fred again  - Danielle (smile on my face) [Visualiser]/Fred again.. - Danielle (smile on my face) [Visualiser] (152kbit_Opus).opus'
path3 = '../../Downloads/Retrograde (Original Mix)/Retrograde (Original Mix) (128kbit_AAC).wav'
sample_rate = 44100

seconds = 5
start = sample_rate*41
end = int(start+sample_rate*seconds)

separator = Separator('spleeter:2stems')

audio_loader = AudioAdapter.default()
waveform1, _ = audio_loader.load(path1, sample_rate=sample_rate)
print(f'Shape of Waveform1 vocal: {waveform1.shape}')

waveform2, _ = audio_loader.load(path2, sample_rate=sample_rate)
print(f'Shape of Waveform2 vocal: {waveform2.shape}')

waveform3, _ = audio_loader.load(path3, sample_rate=sample_rate)
print(f'Shape of Waveform3 vocal: {waveform3.shape}')

prediction1 = separator.separate(waveform1)
print(f'Shape of prediction1 vocal: {prediction1["vocals"].shape}')
prediction2 = separator.separate(waveform2)
print(f'Shape of prediction2 vocal: {prediction2["vocals"].shape}')
prediction3 = separator.separate(waveform3)
print(f'Shape of prediction3 vocal: {prediction3["vocals"].shape}')

print('Play vocal prediction 1')
sd.play(prediction1['vocals'][start:end], sample_rate)
sd.wait()

print('Play vocal prediction 2')
sd.play(prediction2['vocals'][start:end], sample_rate)
sd.wait()

print('Play vocal prediction 3')
sd.play(prediction3['vocals'][start:end], sample_rate)
sd.wait()

Environment

OS Linux (popOS)
Installation type pip
RAM available 16 GB
Hardware spec GPU: RTX 3070 (Mobile) / CPU: i7-12700H

Additional context

EtienneAb3d commented 1 year ago

Same problem here, using separate_to_file():

Any known solution?

Serge-Andre-MASSON commented 1 year ago

A first work around is to change the STFTBackend used by the separator:

from spleeter.audio import STFTBackend

backend = STFTBackend.LIBROSA
separator = Separator('spleeter:2stems', stft_backend=backend)

I believe there is an issue with the way data generator is updated when using the STFTBackend.TENSORFLOW (automatically chosen depending on whatever may be your environment settings, for me it was LIBROSA).

I will look further into this and come back with a solution as soon as I can.