Open s2t2 opened 1 year ago
@s2t2 The two columns are the stereo for the left and right channels. The Spleeter model is trained with stereo input.
You can use both of the channels to represent the vocal. But you can also average them so that you can get mono-channel sounds.
@biendltb thanks for the info!
I compared the values using separate_to_file method and reload with librosa and using separate method and extract average of two columns which locate vocals. But I found these values are not same.
I wonder why it is.
Could you please provide more context about the two columns returned by the raw waveform based separation method?
I noticed there are two columns for each stem, and this is consistent across 2, 4, and 5 stem models.
For example, when we look at the vocals, they are returned in two columns. When I play the data represented by the first column, it sounds like the vocals. When I play the data represented by the second column, it also sounds like the vocals. However their values are slightly different.
So why are there two columns? What is the difference between their values? If we want to represent the vocals, should we use the first column, or second column, or both, or an average?
Thanks!