NextAudioGen / ultimatevocalremover_api

API for a Vocal Remover that uses Deep Neural Networks.
MIT License
91 stars 10 forks source link

[Bug] MDX23C-8KFFT-InstVoc_HQ bug on linux (Google Colab) #5

Open ShiromiyaG opened 8 months ago

ShiromiyaG commented 8 months ago

I was testing the MDX23C-8KFFT-InstVoc_HQ on Google Colab, and I was surprised when I heard the result, the audio was slow, the singer was singing slowly and the audio length was longer. I tested the same song on Windows with the same settings, and the results were normal. Here, the code I used, both in Colab and on Windows:

MDX23C = models.MDXC(name="MDX23C-8KFFT-InstVoc_HQ", other_metadata={'is_mdx_c_seg_def': True,'segment_size': 384,'batch_size': 8,'overlap_mdx23': 8,'semitone_shift': 0},device=device, logger=None)
res = MDX23C(input_file)
vocals = res["vocals"]
af.write(f"{no_inst_folder}/{basename}_MDX23C.wav", vocals, MDX23C.sample_rate)

Here, the link to the songs: https://drive.google.com/drive/folders/11aete_dd56XqR68P2cr_BMRlPhvHb7W0?usp=drive_link

And also an Audacity photo of the songs: image

MohannadEhabBarakat commented 8 months ago

Are you sure that the input_file had 44100 sampling rate? The current code doesn’t resample automatically.

ShiromiyaG commented 8 months ago

@MohannadEhabBarakat Yes, I'm sure, I don't think I've ever used Hi-Res audio in separation. All the audio I use comes from Deezer

ShiromiyaG commented 8 months ago

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

ShiromiyaG commented 8 months ago

I just tested with two models, a VR (karokee_4band_v2_sn) and an MDX (Reverb HQ), and both gave normal results. I remembered that in the last tests I did, I used videos from YT, not from Deezer, but I don't think this is a problem, since the normal results from VR and MDX were using a video from YT

ShiromiyaG commented 7 months ago

I was testing the HQ4, it also has this same problem, both on Windows and Linux. It looks like the semitone_shift is wrong. Also, this message apear

C:\Users\Guilherme\anaconda3\lib\site-packages\uvr\models_dir\mdx\mdx_interface.py:270: RuntimeWarning: invalid value encountered in divide
  tar_waves = result / divider
MohannadEhabBarakat commented 7 months ago

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

I think that might be caused because of package versions or resampling algorithms. I noticed that UVR GUI used different resampling according to the OS. I'm not sure why they did it but I just followed them to replicate the same results. For the package versions unfortunately even using the same versions might not solve the issue; As some libraries will have different implementations on different OSs (even with the same version). The workaround that worked for me in the past was to wrap everything in a docker file. Which is basically unifying the OS.

As I'm back now I'll be working on:

  1. Fixing the bugs you found
  2. Adding new docs
  3. Adding new weights (at least the ones you tested)

So if you can send me an email with your findings and the current bugs, it will help me a lot 🤗. Mohannad.Barakat@fau.de

ShiromiyaG commented 7 months ago

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

I think that might be caused because of package versions or resampling algorithms. I noticed that UVR GUI used different resampling according to the OS. I'm not sure why they did it but I just followed them to replicate the same results. For the package versions unfortunately even using the same versions might not solve the issue; As some libraries will have different implementations on different OSs (even with the same version). The workaround that worked for me in the past was to wrap everything in a docker file. Which is basically unifying the OS.

As I'm back now I'll be working on:

  1. Fixing the bugs you found
  2. Adding new docs
  3. Adding new weights (at least the ones you tested)

So if you can send me an email with your findings and the current bugs, it will help me a lot 🤗. Mohannad.Barakat@fau.de

I can try to help, but I don't know if it would be of much help, since I don't use most models, and I end up using only specific ones. In fact, I tested a model that is not available in the UVR repository, but that works both in UVR and in your code. If you want to take a look at this model I'm referring to, I uploaded it to the link below: https://github.com/ShiromiyaG/RVC-AI-Cover-Maker/releases (its the karokee model)