Open xiebruce opened 2 years ago
Hi @xiebruce, For your first question, you have some information about parameters set in config files in the wiki. It is possibly a bit incomplete though. Regarding the instrument list, it should be set according to the dataset you want to train on. For instance with musdb, you can possibly use ["vocals", "instrumentals"] as you have an both instrumentals.wav and vocals.wav file for every track and that they will sum up to mixture.wav.
For your second question, spleeter was not made for dealing with the multi-stem format *.mp4, so you should use the multiple waveform version of musdb. You can do a different split than the original proposed musdb one, which is only provided for algorithm comparison puposes. So if you don't plan to compare your model with other model on the test set, you can possibly use songs of the test set in your training.
The third question concerns musdb and not spleeter, so this is not the right place for asking/answering it.
For your 4th questions, indeed you can provide only two columns if you'd like to perform 2 stems separation. As mentioned in the answer to question 2, you need to ensure that the provided stems sums up to the mix (e.g. the sum of the stems is equal to the mix). With musdb, you can do it with the instrumentals
stem and the vocal
stems.
Hello! I am doing the same but with Beethoven Cello Sonatas.
How many hours of samples/data are you using to train spleeter?
Thanks!
@isolepinas Sorry, I still don't know how to do it yet. But I think this is depends on your computer's performance and the sample data size. Can your share the whole process, the step that you are doing? I prefer use examples and screenshots rather than just describing, thank you in advance.
Dear Bruce,
Just like you I am in the first steps of the process. My idea is to feed to spleeter 3 versions of a performance (piano solo),(cello solo) and (piano and cello together), I aim to train spleeter to understand whats a piano and whats a cello and when together, it should be able to only extract cello without losing vibrato, portamento and other characteristics.
Some studies have used 44 samples or 1h and 14 minutes such as here https://veleslavia.github.io/conditioned-u-net/
Maybe you will find it interesting!
lets keep in touch so we can share our findings and processes!
On Sat, 26 Mar 2022 at 10:50, Bruce @.***> wrote:
@isolepinas https://github.com/isolepinas Sorry, I still don't know how to do it yet. But I think this is depends on your computer's performance and the data size. Can your share the whole process, the step that you are doing? I prefer use examples and screenshots rather than just describing, thank you in advance.
— Reply to this email directly, view it on GitHub https://github.com/deezer/spleeter/issues/740#issuecomment-1079660733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYNKUM3RWJD55X6KK73LQNLVB3TVXANCNFSM5QYFB33Q . You are receiving this because you were mentioned.Message ID: @.***>
@isolepinas OK, thank you.
Below is the content from Here and I've read it.
Train model
For training your own model, you need:
From the command above, I notice that I need to provide:
musdb_config.json
file;musdb
Question 1
A
musdb_config.json
file look like below, copy from hereBut where can I get the full documentation of it? For example, what does
T
andF
mean? forinstrument_list
, can I only use["vocals", "other"]
? where can I get the full documentation of all these config options?Question 2
I've downloaded the musdb from musdb18.zip and extract the zip file, I found that it is a folder containing 2 folders:
train
andtest
(see the screenshot below)Inside the
train
andtest
folder, there are all mp4 files(use mp4 instead of mp3 or aac is because mp4 can contain more than one track in it)I've listened the mp4 file in
train
folder andtest
folder in musdb18.zip, it seems they are no any difference, they are all songs.So in my understanding, they are no difference. Assuming that I have 150 song file, can I choose 100 of them for training and the rest for validation?
Question 3
I use
ffprobe
to check themp4
files mentioned above, I found it has many tracks, the first track is a mix of all audio tracks, the other audio tracks are separated tracks(vocals, drum, bass, .etc), and the last track is a video track, but in fact it has no video, it's a still png image.Now I have vocals.m4a and bg-musics.m4a, how can I merge a vocal and it's corresponding bg-music and a album cover image to a mp4 file by using ffmpeg?
Question 4
From question1 we can know that we also need 2 csv file:
musdb_train.csv
andmusdb_validation.csv
.I notice that the musdb_train.csv file has 6 columns:
If I only needs 2stems, it that I only need to provide these 4 columns in this csv file?