NextAudioGen / ultimatevocalremover_api

API for a Vocal Remover that uses Deep Neural Networks.
MIT License
91 stars 10 forks source link

How to support BS_RoFormer model #6

Closed changqingonly closed 7 months ago

changqingonly commented 7 months ago

the model of bs_roformer in “https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models” has been released, how do i support theses model, thx~

MohannadEhabBarakat commented 7 months ago

do you know which architecture it follows? VR, MDX ...etc?

ShiromiyaG commented 7 months ago

Hi @MohannadEhabBarakat , as far as I know it's a new architecture, but in UVR 5 Beta, these models are in the MDX tab

MohannadEhabBarakat commented 7 months ago

@ShiromiyaG Thanks for the info.

I'm working on docs explaining how to use existing models, add a new model, add a new architecture, and hyper-parametars usage for each architecture. However as this is still far from v1 here is a quick idea about adding new weights.

weights are added in models.json So if you just add bs_roformer under mdx it should work fine. It should look like this:

"mdx":{
        "UVR-MDX-NET-Inst_1":{
            "model_path":[
                "https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-MDX-NET-Inst_1.onnx"
            ]       
        },
        "bs_roformer":{
            "model_path":[
                # path_from_UVR_or_other_source
            ]       
        },

    },

Note: model_path must be a list. This is built to cover models that use multiple files.

Finally to use it:

import uvr
from uvr import models
from uvr.utils.get_models import download_all_models
import torch
import audiofile
import json

models_json = json.load(open("/content/ultimatevocalremover_api/src/models_dir/models.json", "r"))
download_all_models(models_json)
name = {name_of_your_audio}
device = "cuda"

bs_roformer = models.MDX(name="bs_roformer", other_metadata={#mdx_hprams_that_you_want}, device=device, logger=None)

# Separating an audio file
res = bs_roformer(name)
seperted_audio = res["separated"]
vocals = seperted_audio["vocals"]
base = seperted_audio["bass"]
drums = seperted_audio["drums"]
other = seperted_audio["other"]

I'll leave this issue open so you @changqingonly could share the results here 💪💪

ShiromiyaG commented 7 months ago

@MohannadEhabBarakat Okay, I'm going to test BSRoformer today. By the way, it is in the MDX tab, but in the UVR files, the configuration files for these models are in the MDX23C folder

MohannadEhabBarakat commented 7 months ago

Then I'd suggest trying to add it under mdxc first (not mdx)

ShiromiyaG commented 7 months ago

@MohannadEhabBarakat I tested the BSRoformer and the inference didn't even start. I believe that it will not work without making some changes to the code. By the way, UVR's models_data.json file has changed to support these models (And that's probably not the only thing that changed in the code), take a look at it model_data.json Ah, also, this was the output I got when trying to run it

Traceback (most recent call last):
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\ml_collections\config_dict\config_dict.py", line 903, in __getitem__
    field = self._fields[key]
KeyError: 'norm'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\ml_collections\config_dict\config_dict.py", line 827, in __getattr__
    return self[attribute]
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\ml_collections\config_dict\config_dict.py", line 909, in __getitem__
    raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'norm'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Biblioteca\Downloads\ultimatevocalremover_api\teste.py", line 15, in <module>
    MDX = models.MDXC(name="bs-roformer", other_metadata={'is_mdx_c_seg_def': True,'segment_size': 384,'batch_size': 4,'overlap_mdx23': 8,'semitone_shift': 0},device=device, logger=None)
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\uvr\models.py", line 370, in __init__
    model_run = mdxc_api.load_modle(model_path, model_data, device)
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\uvr\models_dir\mdxc\mdxc_interface.py", line 99, in load_modle
    model = TFC_TDF_net(model_data, device=device)
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\uvr\models_dir\mdxc\tfc_tdf_v3.py", line 155, in __init__
    norm = get_norm(norm_type=config.model.norm)
  File "C:\Users\Guilherme\anaconda3\lib\site-packages\ml_collections\config_dict\config_dict.py", line 829, in __getattr__
    raise AttributeError(e)
AttributeError: "'norm'"

I don't know if this helps, but these two codes use both BSRoformer and VitLarge (this doesn't run even in the beta release of UVR) MVSEP-MDX23-Colab_v2 by Jarredou Music-Source-Separation-Training by ZFTurbo Also, Jarredou's code has an interesting thing called BigShifts, which doesn't exist in UVR and apparently considerably increases the quality of the final result.

MohannadEhabBarakat commented 7 months ago

Thanks that helps. I think this model needs some new code to be added (as it is a different implementation). At the moment I'm focusing more on adding pipelines, ensembles, and docs. Then I'll work on supporting new models.

Of course any contribution is welcomed