[Bug] Spleeter returns previous prediction when running more than once

MrBanhBao commented 1 year ago

[x] I didn't find a similar issue already open.
[x] I read the documentation (README AND Wiki)
[x] I have installed FFMpeg
[x] My problem is related to Spleeter only, not a derivative product (such as Webapplication, or GUI provided by others)

Description

Spleeter returns the prediction of the previous waveform when running separate method with the same separator object more than once.

e.g: file1 --separate--> prediction of file1 file2 --separate--> prediction of file1 file3 --separate--> prediction of file2

I am using spleeter version 2.3.2, installed with pip in a conda environment with python 3.8.15.

Step to reproduce

I wrote a small script which should illustrate and reproduce the bug:

from spleeter.separator import Separator
from spleeter.audio.adapter import AudioAdapter
import sounddevice as sd

path1 = '../../Downloads/Doja Cat - Say So (Official Video) (152kbit_Opus).opus'
path2 = '../../Downloads/Fred again  - Danielle (smile on my face) [Visualiser]/Fred again.. - Danielle (smile on my face) [Visualiser] (152kbit_Opus).opus'
path3 = '../../Downloads/Retrograde (Original Mix)/Retrograde (Original Mix) (128kbit_AAC).wav'
sample_rate = 44100

seconds = 5
start = sample_rate*41
end = int(start+sample_rate*seconds)

separator = Separator('spleeter:2stems')

audio_loader = AudioAdapter.default()
waveform1, _ = audio_loader.load(path1, sample_rate=sample_rate)
print(f'Shape of Waveform1 vocal: {waveform1.shape}')

waveform2, _ = audio_loader.load(path2, sample_rate=sample_rate)
print(f'Shape of Waveform2 vocal: {waveform2.shape}')

waveform3, _ = audio_loader.load(path3, sample_rate=sample_rate)
print(f'Shape of Waveform3 vocal: {waveform3.shape}')

prediction1 = separator.separate(waveform1)
print(f'Shape of prediction1 vocal: {prediction1["vocals"].shape}')
prediction2 = separator.separate(waveform2)
print(f'Shape of prediction2 vocal: {prediction2["vocals"].shape}')
prediction3 = separator.separate(waveform3)
print(f'Shape of prediction3 vocal: {prediction3["vocals"].shape}')

print('Play vocal prediction 1')
sd.play(prediction1['vocals'][start:end], sample_rate)
sd.wait()

print('Play vocal prediction 2')
sd.play(prediction2['vocals'][start:end], sample_rate)
sd.wait()

print('Play vocal prediction 3')
sd.play(prediction3['vocals'][start:end], sample_rate)
sd.wait()

Environment


OS	Linux (popOS)
Installation type	pip
RAM available	16 GB
Hardware spec	GPU: RTX 3070 (Mobile) / CPU: i7-12700H

Additional context

EtienneAb3d commented 1 year ago

Same problem here, using separate_to_file():

first processing is ok
second processing returns a copy of the first one
third processing returns what should have been the second one
etc

Any known solution?

Serge-Andre-MASSON commented 1 year ago

A first work around is to change the STFTBackend used by the separator:

from spleeter.audio import STFTBackend

backend = STFTBackend.LIBROSA
separator = Separator('spleeter:2stems', stft_backend=backend)

I believe there is an issue with the way data generator is updated when using the STFTBackend.TENSORFLOW (automatically chosen depending on whatever may be your environment settings, for me it was LIBROSA).

I will look further into this and come back with a solution as soon as I can.

deezer / spleeter