ZFTurbo / MVSEP-MDX23-music-separation-model

Model for MDX23 music separation contest
658 stars 91 forks source link

Wont process pass 20% #1

Open martel80 opened 1 year ago

martel80 commented 1 year ago

WIN 10 22H2 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz 16GB RAM

GPU: RTX 3060

I tried with GPU and CPU.

With CPU, it freeze at 20% (it take about 5 minute to go to 20%) and nothing happen. When you click something , anything, the whole thing crash.

With GPU, the whole thing crash including the cmd prompt window when reaching 20%

See CMD Prompt logs:

CPU Rendering:

C:\Users*******\MVSep-MDX23>"./Miniconda/python.exe" gui.py GPU use: 0 Use device: cpu Model path: C:\Users*******\MVSep-MDX23/models/Kim_Vocal_1.onnx Device: cpu Chunk size: 200000000 Model path: C:\Users*******\MVSep-MDX23/models/Kim_Inst.onnx Device: cpu Chunk size: 200000000 Go for: C:/Users/*/OneDrive/Musique/Album/Sean Price, M-Phazes/[E] Land of the Crooks [148615077] [2013]/01 - Sean Price, M-Phazes - Bag of Shit (feat. Loudmouf Choir)(Explicit).flac Input audio: (2, 7537326) Sample rate: 44100 C:\Users**\Downloads\MVSep-MDX23\inference.py:128: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.) mix_waves = torch.tensor(mix_waves, dtype=torch.float32).to(device)

GPU Rendering:

C:\Users**\Downloads\MVSep-MDX23>"./Miniconda/python.exe" gui.py GPU use: 0 Use device: cuda:0 Model path: C:\Users*****\Downloads\MVSep-MDX23/models/Kim_Vocal_1.onnx Device: cuda:0 Chunk size: 1000000 Model path: C:\Users\\Downloads\MVSep-MDX23/models/Kim_Inst.onnx Device: cuda:0 Chunk size: 1000000 Go for: C:/Users//OneDrive/Musique/Album/Sean Price, M-Phazes/[E] Land of the Crooks [148615077] [2013]/01 - Sean Price, M-Phazes - Bag of Shit (feat. Loudmouf Choir)(Explicit).flac Input audio: (2, 7537326) Sample rate: 44100 C:\Users\\Downloads\MVSep-MDX23\inference.py:128: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.) mix_waves = torch.tensor(mix_waves, dtype=torch.float32).to(device) 2023-05-12 13:20:55.9949877 [E:onnxruntime:, sequential_executor.cc:494 onnxruntime::ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv_3' Status Message: D:\a_work\1\s\onnxruntime\core\framework\bfc_arena.cc:368 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 603979776

Traceback (most recent call last): File "C:\Users**\Downloads\MVSep-MDX23\gui.py", line 36, in run predict_with_model(self.options) File "C:\Users***\Downloads\MVSep-MDX23\inference.py", line 479, in predict_with_model result, sample_rates = model.separate_music_file(audio.T, sr, update_percent_func, i, len(options['input_audio'])) File "C:\Users****\Downloads\MVSep-MDX23\inference.py", line 344, in separate_music_file sources1 = demix_full( File "C:\Users****\Downloads\MVSep-MDX23\inference.py", line 160, in demix_full sources = demix_base(mix_part, device, models, infer_session) File "C:\Users****\Downloads\MVSep-MDX23\inference.py", line 133, in demix_base res = _ort.run(None, {'input': stft_res.cpu().numpy()})[0] File "C:\Users****\Downloads\MVSep-MDX23\Miniconda\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running FusedConv node. Name:'Conv_3' Status Message: D:\a_work\1\s\onnxruntime\core\framework\bfc_arena.cc:368 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 603979776

ghost commented 1 year ago

WIN 11 22H2 13th Gen Intel(R) Core(TM) i9-13900K 3.00 GHz 64 GB RAM

GPU: RTX 4080 / 16 GB

Tried only on GPU using the GUI / console process.

Same as martel80 but reach 60% (very slow, 10/15 min) before getting memory crash.

I will test on CPU.

EDIT: The CPU seen to work, but the progress bar is only 40% after 21 minutes of processing, the memory usage (RAM) is ~75% so 47.5 GB for my computer. The CPU run in between 30% up to 60% (~4.15 GHz). Now after writing this message the progress bar is always at 40% after 24 minutes of processing !!!

EDIT2: After 70% the progress bar reach 100% faster. So the process of the FLAC 44.1KHz file (22MB for 3 mn 23 sec) take 51 minutes on my CPU. Fortunately the result is really good, well done guys, really well done!!! A little optimization of the code would be welcome because I'm not sure that even with 64GB of RAM a file longer than 5 minutes will pass ;)

EDIT3: The same file (less larger than the previous test) crash on my GPU at 70% (after 6 minutes of processing) for 356MB of VRAM (CUDA out of memory error). Here is the console log catched before console closing.

Input audio: (2, 8993460) Sample rate: 44100
D:\Separation\MVSep-MDX23\inference.py:128: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.)
  mix_waves = torch.tensor(mix_waves, dtype=torch.float32).to(device)
Traceback (most recent call last):
  File "D:\Separation\MVSep-MDX23\gui.py", line 36, in run
    predict_with_model(self.options)
  File "D:\Separation\MVSep-MDX23\inference.py", line 479, in predict_with_model
    result, sample_rates = model.separate_music_file(audio.T, sr, update_percent_func, i, len(options['input_audio']))
  File "D:\Separation\MVSep-MDX23\inference.py", line 397, in separate_music_file
    out = 0.5 * apply_model(model, audio, shifts=shifts, overlap=overlap)[0].cpu().numpy() \
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\apply.py", line 171, in apply_model
    out = apply_model(sub_model, mix, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\apply.py", line 196, in apply_model
    shifted_out = apply_model(model, shifted, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\apply.py", line 226, in apply_model
    chunk_out = future.result()
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\utils.py", line 129, in result
    return self.func(*self.args, **self.kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\apply.py", line 241, in apply_model
    out = model(padded_mix)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\hdemucs.py", line 732, in forward
    x = encode(x, inject)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\hdemucs.py", line 149, in forward
    y = self.dconv(y)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\demucs\demucs.py", line 153, in forward
    x = x + layer(x)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
    input = module(input)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\Separation\MVSep-MDX23\Miniconda\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 356.00 MiB (GPU 0; 15.99 GiB total capacity; 2.33 GiB already allocated; 0 bytes free; 2.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

EDIT4: By modify the chunk size in inference.py to 10000, the GPU process do not crash but stay very slow ;)

ZFTurbo commented 1 year ago

In settings set 'single onnx' will resolve issue if you have less than 12 GB of GPU memory. Quality degrade just a little bit.

I will rewrote the code to consume less GPU.

sanskar-mk2 commented 1 year ago

Hello friend, can you also give chunk size slider in the gui please. I try to use it with default settings and my GPU is super making noise.

image

martel80 commented 1 year ago

In settings set 'single onnx' will resolve issue if you have less than 12 GB of GPU memory. Quality degrade just a little bit.

I will rewrote the code to consume less GPU.

It still failed here. It crashed a bit after 20% this time.

ghost commented 1 year ago

In settings set 'single onnx' will resolve issue if you have less than 12 GB of GPU memory. Quality degrade just a little bit. I will rewrote the code to consume less GPU.

It still failed here. It crashed a bit after 20% this time.

In the gui.py file, line 142, you have an array for the default settings.

options = {
            'input_audio': root['input_files'],
            'output_folder': root['output_folder'],
            'cpu': root['cpu'],
            'single_onnx': root['single_onnx'],
            'overlap_large': 0.6,
            'overlap_small': 0.5,
        }

Just add a new option called chunk_size for settings a lower value, here I use 10000. I don't know if the value impact the performances, but you will not have crash anymore I think.

        options = {
            'input_audio': root['input_files'],
            'output_folder': root['output_folder'],
            'cpu': root['cpu'],
            'single_onnx': root['single_onnx'],
            'overlap_large': 0.6,
            'overlap_small': 0.5,
            'chunk_size': 10000,
        }
lucellent commented 1 year ago

I can't fix this issue even with chunk size = 1000 and single ONNx enabled... I have 3070 with 8GB VRAM

ZFTurbo commented 1 year ago

I updated code. Now it requires less GPU. I was able to process file on full model at 8 GB card. It's now default code. If you need old faster code use key "--large_gpu"

lucellent commented 1 year ago

Sadly I still get the issue, even with chunk size 500 and single onnx

ZFTurbo commented 1 year ago

Sadly I still get the issue, even with chunk size 500 and single onnx

Are you sure you updated code? Do you have a message in log: Use low GPU memory version of code

lucellent commented 1 year ago

Sadly I still get the issue, even with chunk size 500 and single onnx

Are you sure you updated code? Do you have a message in log: Use low GPU memory version of code

I don't, I see the same error

but I'm sure I downloaded the newest zip files, it showed that they were updated 3 hours ago

martel80 commented 1 year ago

I downloaded the new version of Code.

I'm still experiencing the same issue.

I do see Use low GPU memory version of code

I also tried with CPU and it also crashed.

ZFTurbo commented 1 year ago

I downloaded the new version of Code. I'm still experiencing the same issue. I do see Use low GPU memory version of code I also tried with CPU and it also crashed.

1) Which GPU do you have? 2) Can you try with single ONNX option?

martel80 commented 1 year ago

I downloaded the new version of Code. I'm still experiencing the same issue. I do see Use low GPU memory version of code I also tried with CPU and it also crashed.

1. Which GPU do you have?

2. Can you try with single ONNX option?

WIN 10 22H2 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz 16GB RAM

GPU: RTX 3060

I will try with ONNX now.

ZFTurbo commented 1 year ago

How much GPU memory?

martel80 commented 1 year ago

How much GPU memory?

12 GB GDDR6

I tried with Single ONNX and it also crashed.

ZFTurbo commented 1 year ago

Can you show me the log? Put in run.bat on the 2nd line: "cmd". It will keep console open after crash.

martel80 commented 1 year ago

Can you show me the log? Put in run.bat on the 2nd line: "cmd". It will keep console open after crash.

C:\MVSep-MDX23>"./Miniconda/python.exe" gui.py GPU use: 0 Use low GPU memory version of code Use device: cuda:0 Use single vocal ONNX Go for: C:/MVSep-MDX23/01 - Sean Price, M-Phazes - Bag of Shit (feat. Loudmouf Choir)(Explicit).flac Input audio: (2, 7537326) Sample rate: 44100 Model path: C:\MVSep-MDX23/models/Kim_Vocal_1.onnx Device: cuda:0 Chunk size: 1000000 C:\MVSep-MDX23\inference.py:128: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ..\torch\csrc\utils\tensor_new.cpp:248.) mix_waves = torch.tensor(mix_waves, dtype=torch.float32).to(device) 2023-05-15 14:21:22.0004569 [E:onnxruntime:, sequential_executor.cc:494 onnxruntime::ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv_61' Status Message: D:\a_work\1\s\onnxruntime\core\framework\bfc_arena.cc:368 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 11796480

Traceback (most recent call last): File "C:\MVSep-MDX23\gui.py", line 36, in run predict_with_model(self.options) File "C:\MVSep-MDX23\inference.py", line 804, in predict_with_model result, sample_rates = model.separate_music_file(audio.T, sr, update_percent_func, i, len(options['input_audio'])) File "C:\MVSep-MDX23\inference.py", line 587, in separate_music_file sources1 = demix_full( File "C:\MVSep-MDX23\inference.py", line 160, in demix_full sources = demix_base(mix_part, device, models, infer_session) File "C:\MVSep-MDX23\inference.py", line 133, in demix_base res = _ort.run(None, {'input': stft_res.cpu().numpy()})[0] File "C:\MVSep-MDX23\Miniconda\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running FusedConv node. Name:'Conv_61' Status Message: D:\a_work\1\s\onnxruntime\core\framework\bfc_arena.cc:368 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 11796480

C:\MVSep-MDX23>cmd Microsoft Windows [Version 10.0.19045.2965] (c) Microsoft Corporation. All rights reserved.

C:\MVSep-MDX23>

martel80 commented 1 year ago

And sorry for the song name, its just the first FLAC I found on my list haha :)

martel80 commented 1 year ago

My apologies, apparently I only have 8 GB of DDR6 on my GPU.

I just checked in Task Manager and its written 8 GB. Not 12 GB.

Could it be the issue?

EDIT: I'm confused now, I have one window that says 8 GB and the other one says 14 GB.

image

image

lucellent commented 1 year ago

My apologies, apparently I only have 8 GB of DDR6 on my GPU.

I just checked in Task Manager and its written 8 GB. Not 12 GB.

Could it be the issue?

EDIT: I'm confused now, I have one window that says 8 GB and the other one says 14 GB.

image

image

You have 6GB of VRAM. GPU 0 is your integrated GPU, inside of the CPU, it's separate from your Nvidia RTX 3060 (also this model doesn't exist with 14GB)

ZFTurbo commented 1 year ago

Actually you have only 6 GB. Yes, it can be an issue...

Also it's possible may be inference tried to run on Intel video card

ZFTurbo commented 1 year ago

Can you check during run in task manager where the memory is allocated. intel or NVIDIA?

martel80 commented 1 year ago

Well, there you go. We found out why everything crash. And I thought I had a decent laptop.

Its about to become a very expensive door holder, let me tell you that.

Back to UVR I guess.

Sorry for the troubles, I genuinely didn't know I had a Computosaur

martel80 commented 1 year ago

image

ZFTurbo commented 1 year ago

Mine crashes after the following:

1. Click on start separation

2. Separation fails (already mentioned the issue)

3. Click Stop separation

4. Click start separation again
  1. Don't use "Stop separation"
  2. Why do you think separation fails? Can you wait 5-10 minutes? From log I can't see any fail.
lucellent commented 1 year ago

Mine crashes after the following:

1. Click on start separation

2. Separation fails (already mentioned the issue)

3. Click Stop separation

4. Click start separation again
  1. Don't use "Stop separation"
  2. Why do you think separation fails? Can you wait 5-10 minutes? From log I can't see any fail.

It fails due to the same issue as others, displayed in the CMD, but I have 8GB of VRAM I will try waiting this time but GPU usage goes down after a few seconds after the error message

Edit: Actually looks like it doesn't display any error message indeed, I assumed it did

lucellent commented 1 year ago

Could it have to do with the fact it converts files to 32bit?

lucellent commented 1 year ago

Ok, you're right. It actually processed successfully, with default chunk size (1,000,000) and NO single onnx

martel80 commented 1 year ago

Ok, you're right. It actually processed successfully, with default chunk size (1,000,000) and NO single onnx

How long did it take to complete the whole seperation?

ZFTurbo commented 1 year ago

So now we know 8 GB is ok 6 GB is not ok. I think with single ONNX it probably should work at 6 GB too.

martel80 commented 1 year ago

So now we know 8 GB is ok 6 GB is not ok. I think with single ONNX it probably should work at 6 GB too.

It doesnt. I tried and it failed. Thats the one I pasted up top when you asked.

lucellent commented 1 year ago

Ok, you're right. It actually processed successfully, with default chunk size (1,000,000) and NO single onnx

How long did it take to complete the whole seperation?

6-10 minutes

Now that I think about it it took the same time as with Demucs 4 ht model in UVR GUI, that's why

did you try lower chunk size?

martel80 commented 1 year ago

Ok, you're right. It actually processed successfully, with default chunk size (1,000,000) and NO single onnx

How long did it take to complete the whole seperation?

6-10 minutes

Now that I think about it it took the same time as with Demucs 4 ht model in UVR GUI, that's why

did you try lower chunk size?

See, that's the part that really puzzle me as it took exactly 1 min 07 seconds to separate that same file with htdemucs_6s

It took exactly 4 min 19 sec to separate that same file with htdemucs_ft image

Both on GPU Conversion

So how come it take longer on your side yet you are able to separate them and it take less time on my side ?

I dont understand the logic here.

lucellent commented 1 year ago

Ok, you're right. It actually processed successfully, with default chunk size (1,000,000) and NO single onnx

How long did it take to complete the whole seperation?

6-10 minutes Now that I think about it it took the same time as with Demucs 4 ht model in UVR GUI, that's why did you try lower chunk size?

See, that's the part that really puzzle me as it took exactly 1 min 07 seconds to separate that same file with htdemucs_6s

It took exactly 4 min 19 sec to separate that same file with htdemucs_ft

Both on GPU Conversion

So how come it take longer on your side yet you are able to separate them and it take less time on my side ?

I dont understand the logic here.

ZFTurbo's repo is more resource hungry, that's how I understand it

try with less chunks, someone else posted earlier here how to edit the file to use manual chunks

ZFTurbo commented 1 year ago

I use much larger overlap for all models. That's why they are slower.

martel80 commented 1 year ago

All right. With a chunk size of 10000, I was able to complete the separation in exactly 10 minutes and 05 seconds.

I'm listening to the drum tracks on both and it seems like there's a lot more leak on the MVSEP compared to the htdemucs_ht.

You can listen to it for yourself here:

https://1drv.ms/f/s!AtfvmPHCKb6a7QP-Jj1b7p_8xeWQ?e=6ikzqO

Does chunk affect the quality of the separation? Or would it be ONNX ? There's a lot more bleeding in the MVSEP.

lucellent commented 1 year ago

Processing a second track after a successful first one results in the same issue as martel80 + crash

Doesn't happen if I close the app and open again, only when I process a song and after that another one without closing and reopening the GUI

image

ZFTurbo commented 1 year ago

There is problem with ONNX models. I try to release memory, but they are still in...