Closed KitsuneX07 closed 1 month ago
I think it was related issue: https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/56
For pre-trained model it will return only one stem until you train/finetune them with different config.
For pre-trained model it will return only one stem until you train/finetune them with different config.
I see. I find the pre-trained models only contain the weights of the target_instrument. So I think there's no way to get the other stem(s) directly from the pre-trained models. Thanks for your reply.
When using BS_roformer and mel_band_roformer models for inferencing, I found that only target_instrument is returned and the other stem is generated as (mixture - target_instrument). I tried to change some codes and removed the "target_instrument", but it turned out that both stems were the shape of the target_instrument. I noticed that from https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/main/configs/config_musdb18_bs_roformer.yaml, in the config there are multi-stems and one target_instrument, however I didn't find the weight of the model and from you code in inference.py I guess that there will only be two stems output: voclas and instrumental. I'm curious that if that the model has the ability to return multi-stems: vocal, bass, drums and other, instead of the two according to the target_instrument. If it does, how to change the inference code? Additionally, @SUC-DriverOld suggests me to use his aspiration model whose config's target_instrument is null, I found it works fine and did return two stems, whcih may suggest mel_band_rorformer (I guess BS_roformer too) can handle multi-stem cases. And I'm not sure if the pretrained models (such as model_bs_roformer_ep_368_sdr_12.9628, whose config's target_instrument is vocals) can have the same effect.