ZFTurbo / Music-Source-Separation-Training

Repository for training models for music source separation.
354 stars 45 forks source link

Post your model #1

Open ZFTurbo opened 9 months ago

ZFTurbo commented 9 months ago

To post your model, please, fill the form:

Description: 
Instruments:
Dataset (if known):
Metrics (if known):
Config link: 
Checkpoint link: 
Beatloo-Labs commented 4 months ago

Model Type: bs_roformer Description: My first five days in model training, training on three servers with GPU T4 x 2 Instruments: vocals, drums, bass Dataset: musdb18hq

How to run [example for vocals]: Download config and checkpoint, save to folder with @ZFTurbo training code, run this command:

python inference.py --model_type bs_roformer --config_path config_musdb18_bs_roformer_vocals.yaml --start_check_point model_vocals_bs_roformer_ep_5_sdr_8.0972.ckpt --input_folder input/ --store_dir separation_results/

Input files in input folder, results in separation_results folder

DEMO: Enjoy this link: https://disk.yandex.ru/d/zc5Bca9nTuB7jg < old

Vocals: SDR: 8.09 < 7.55 Config link: https://disk.yandex.com/d/eTOZ9BGpTIRNYw Checkpoint link: https://disk.yandex.com/d/wPdwPZTQMJfAZQ

Drums: SDR: 7.22 < 7.15 Config link: https://disk.yandex.com/d/ab8glguWFltifA Checkpoint link: https://disk.yandex.com/d/auEl3aovvWhYMw

Bass: SDR: 5.78 < 5.28 Config link: https://disk.yandex.com/d/mgqAPCahZQwEgQ Checkpoint link: https://disk.yandex.com/d/BISwCadSNyYb-g

Last update: 21.07.24

yolkispalkis commented 2 months ago

Model Type: mel_band_roformer Description: My first attempt at training, trained for 5 days on 4070 Instruments: percussions Dataset: musdb18hq, moisesdb SDR value based on musdb18hq's test set

SDR: 6.86 SDR: 7.10 SDR: 7.44 SDR: 7.68 Checkpoints: https://disk.yandex.ru/d/MxZ4k-kZ2Q5QqA

Updatet on: 14.06.2024 18:20:00 UTC+3

alexclarke236 commented 2 months ago

Model Type: mel_roformer Description: my first attempt at training Instruments: timpani Dataset: mvsep

alexclarke236 commented 2 months ago

Model Type: mel_band_roformer Description: My first ai training Instruments: percussion Dataset: mvsep

jarredou commented 2 months ago

@alexclarke236 Would you like to share the checkpoints you've trained ? Best way is to host them on a file-sharing site and post the link here like previous users have done.

alexclarke236 commented 2 months ago

Yes

On Tue, Jun 11, 2024 at 8:15 PM Jarredou @.***> wrote:

@alexclarke236 https://github.com/alexclarke236 Would you like to share the checkpoints you've trained ? Best way is to host them on a file-sharing site and post the link here like previous users have done.

— Reply to this email directly, view it on GitHub https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2161829813, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJBMRZEKBP32EHV2GY5IYCDZG6HIVAVCNFSM6AAAAAA67RO3LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRHAZDSOBRGM . You are receiving this because you were mentioned.Message ID: @.***>

verosment commented 2 months ago

Architecture: MDX23c Description: My first somewhat successful attempt at training. Hardware used was my personal RTX 3060 12gb, 64gb DDR4 RAM, Ryzen 5 5600X, Windows 11. Training stopped due to the inconvenience of training on my personal machine and the slow speeds at which the training was progressing. Had I had the funds, I would've rented a GPU from vast.ai. Trained for a total of 208 epochs or roughly ~2,500 minutes. Instruments: Strings (Cello, Double Bass, Violin, Viola), Brass (Trumpet, English Horn, Tuba, Trombone), Wind (Piccolo, Flute, Clarinet, Saxophone), Mellotron Flute & Cello. Other instruments that have a similar quality or sound may be present in the dataset but unaccounted for. Dataset (if known): Custom 97 pair dataset using tracks from isolated-tracks.com, songstems.net, MoisesDB, ARME-Virtuoso-Strings-2.2, traditional-flute-dataset, a bunch of Toby Fox FLP fan recreations and a Dolby atmos rip of the center track of Eleanor Rigby that was layered over a song from MoisesDB. Metrics (if known): SDR 4.4174 on my very small validation set. Performance of the model depends heavily on the input. Config link: https://drive.google.com/file/d/1OTuF3534Ax5SJSsk08e2QLgoxiljqelH/view?usp=sharing Checkpoint link: https://drive.google.com/file/d/1juOW6Q_Puqp_uxMsQpWSWkAm1QSbXIdg/view?usp=sharing

verosment commented 2 months ago

Same model as above, but trained for a further 54 epochs. Sounds better to the ears than the older model in quite a few cases, but scores a lower SDR on the validation set. Picks up wind instruments better than older model in my testing and has the chance to pick up string sections better too. Instruments: Same as above Dataset (if known): Same as above Metrics (if known): SDR 4.0870 on my very small validation set. Performance of the model depends heavily on the input. Config link: https://drive.google.com/file/d/1OTuF3534Ax5SJSsk08e2QLgoxiljqelH/view?usp=sharing Checkpoint link: https://drive.google.com/file/d/1gB6RPUw_knozcY3qF--cpTczoxpkDw5O/view?usp=sharing

Edit: 3/08/2024, currently retraining this model with a larger dataset but using same machine, so will take a while. Results will be posted here if it gets anywhere decent

jarredou commented 1 month ago

Description: MDX23C Drums elements separation model (to apply on drums-only audio) n_fft = 2048 instead of default 8192 was used for more lightweigted required resources. Baseline training (141 epochs) was done by @aufr33, not fully finished, it can be improved.

Instruments: kick, snare, toms, hh, ride, crash

Dataset: created by myself for that task, but had some issues.

Metrics: Instr SDR kick: 18.4312 Instr SDR snare: 13.6083 Instr SDR toms: 13.2693 Instr SDR hh: 6.6887 Instr SDR ride: 5.3227 Instr SDR crash: 7.5152 SDR Avg: 10.8059:

Config & checkpoint : https://github.com/jarredou/models/releases/tag/aufr33-jarredou_MDX23C_DrumSep_model_v0.1

anvuew commented 1 month ago

Description:

mel_band_roformer dereverb model chunk_size: 352768 dim: 256 depth: 6

Instruments:

noreverb,reverb

Dataset:

~90h vocals, 76 types of reverb

Metrics:

SDR noreverb: 7.5669(small valid)

Config link: config Checkpoint link: ckpt

ZFTurbo commented 1 month ago

@anvuew thank you for great model. I have a question about your training. Did you apply reverb for full tracks or for vocal part only? Can you share your validation? I'd like to compare your model with older ones.

anvuew commented 1 month ago

@anvuew thank you for great model. I have a question about your training. Did you apply reverb for full tracks or for vocal part only? Can you share your validation? I'd like to compare your model with older ones.

noreverb is vocal only. valid is too inadequate to share. the MDX Reverb-HQ SDR is 6.5 for my valid.

deton24 commented 1 month ago

For those who have issues running Roformers from this thread in UVR, you must delete the following line from the YAML file: linear_transformer_depth: 0

anvuew commented 1 month ago

Description:

bs_roformer dereverb model chunk_size: 352768 dim: 256 depth: 8

Metrics:

SDR noreverb: 8.0770(small valid)

Config link: config Checkpoint link: ckpt

Although this is a dereverb model, it will also remove harmonies or vocal effects that are not in the center channel.

If you want to add this model to UVR5, first place the config file and weights in the corresponding directory (weights in Ultimate Vocal Remover\models\MDX_Net_Models, config file in Ultimate Vocal Remover\models\MDX_Net_Models\model_data\mdx_c_configs). Delete linear_transformer_depth: 0 from the config file and change stft_hop_length: 512 to stft_hop_length: 441. then open UVR5, you will be prompted to add the model (select the MDX architecture if not). Choose the corresponding config file and check the Roformer model checkbox.

musicalman commented 1 month ago

Congratulations on these fine dereverb models! I just tried using the bs roformer dereverb model inside UVR, and got this error: RuntimeError: "The size of tensor a (352768) must match the size of tensor b (352800) at non-singleton dimension 1 Weirdly, the mel band roformer one works fine. But since bs has a slightly better sdr, I wanted to see if it was worth switching to that.

OMARK313 commented 1 month ago

i need model separation include sdr vocals after separation higher than sdr vocals 12.97 pleasse

lgkt commented 1 month ago

BS Roformer (viperx) is 12.9755?it is old.The latest one is BS Roformer (finetuned) shown in the image,but where to download? who can tell me? image

deton24 commented 1 month ago

If you mean 2024.03 model, t's the same. 12.97 metric comes from private validation dataset which wasn't multisong dataset. All newer Roformers with better metrics are currently not public, so cannot be downloaded.

niedz., 28 lip 2024 o 17:00 lgkt @.***> napisał(a):

BS Roformer (viperx) is 12.9755?it is old.The latest one is BS Roformer (finetuned) shown in the image,but where to download? who can tell me? image.png (view on web) https://github.com/user-attachments/assets/81202ec1-e2da-4206-be19-5b216533a0f8

— Reply to this email directly, view it on GitHub https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/1#issuecomment-2254547771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIJ3EHDRXHB5XXSJTCARKUDZOUBP5AVCNFSM6AAAAAA67RO3LWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGU2DONZXGE . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>