fumiama / Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!
GNU Affero General Public License v3.0
72 stars 9 forks source link

Unable to train model due to GPU memory limit bug #65

Open LordMilutin opened 1 week ago

LordMilutin commented 1 week ago

Hello! I am having issues with training the model. This is all running in a docker-container. Here is the first issue when training index:

voice-clone  | Minibatch step 539/16242: mean batch inertia: 24.545204162597656, ewa inertia: 24.587732213562003
voice-clone  | Converged (lack of improvement in inertia) at step 539/16242

And after I try to train the model, I get this output:

voice-clone  | INFO:Micy:{'data': {'filter_length': 2048, 'hop_length': 400, 'max_wav_value': 32768.0, 'mel_fmax': None, 'mel_fmin': 0.0, 'n_mel_channels': 125, 'sampling_rate': 40000, 'win_length': 2048, 'training_files': './logs/Micy/filelist.txt'}, 'model': {'filter_channels': 768, 'gin_channels': 256, 'hidden_channels': 192, 'inter_channels': 192, 'kernel_size': 3, 'n_heads': 2, 'n_layers': 6, 'p_dropout': 0, 'resblock': '1', 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'resblock_kernel_sizes': [3, 7, 11], 'spk_embed_dim': 109, 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'upsample_rates': [10, 10, 2, 2], 'use_spectral_norm': False}, 'train': {'batch_size': 1, 'betas': [0.8, 0.99], 'c_kl': 1.0, 'c_mel': 45, 'epochs': 20000, 'eps': 1e-09, 'fp16_run': True, 'init_lr_ratio': 1, 'learning_rate': 0.0001, 'log_interval': 200, 'lr_decay': 0.999875, 'seed': 1234, 'segment_size': 12800, 'warmup_epochs': 0}, 'model_dir': './logs/Micy', 'experiment_dir': './logs/Micy', 'save_every_epoch': 10, 'name': 'Micy', 'total_epoch': 70, 'pretrainG': 'assets/pretrained_v2/f0G40k.pth', 'pretrainD': 'assets/pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 0, 'save_every_weights': '1', 'if_cache_data_in_gpu': 0, 'author': ''}
voice-clone  | /usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
voice-clone  |   warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
voice-clone  | INFO:Micy:loaded pretrained assets/pretrained_v2/f0G40k.pth
voice-clone  | INFO:Micy:<All keys matched successfully>
voice-clone  | INFO:Micy:loaded pretrained assets/pretrained_v2/f0D40k.pth
voice-clone  | INFO:Micy:<All keys matched successfully>
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | /usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:744: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
voice-clone  | grad.sizes() = [64, 1, 4], strides() = [4, 1, 1]
voice-clone  | bucket_view.sizes() = [64, 1, 4], strides() = [4, 4, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
voice-clone  |   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
voice-clone  | INFO:Micy:Train Epoch: 1 [0%]
voice-clone  | INFO:Micy:[0, 0.0001]
voice-clone  | INFO:Micy:loss_disc=4.450, loss_gen=2.383, loss_fm=2.990,loss_mel=30.139, loss_kl=9.000
voice-clone  | DEBUG:matplotlib:matplotlib data path: /usr/local/lib/python3.10/dist-packages/matplotlib/mpl-data
voice-clone  | DEBUG:matplotlib:CONFIGDIR=/root/.config/matplotlib
voice-clone  | DEBUG:matplotlib:interactive is False
voice-clone  | DEBUG:matplotlib:platform is linux
voice-clone  | Process Process-1:
voice-clone  | Traceback (most recent call last):
voice-clone  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
voice-clone  |     self.run()
voice-clone  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
voice-clone  |     self._target(*self._args, **self._kwargs)
voice-clone  |   File "/app/infer/modules/train/train.py", line 278, in run
voice-clone  |     train_and_evaluate(
voice-clone  |   File "/app/infer/modules/train/train.py", line 508, in train_and_evaluate
voice-clone  |     scaler.scale(loss_gen_all).backward()
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 525, in backward
voice-clone  |     torch.autograd.backward(
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 267, in backward
voice-clone  |     _engine_run_backward(
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
voice-clone  |     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
voice-clone  | RuntimeError: CUDA error: out of memory
voice-clone  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
voice-clone  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
voice-clone  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
voice-clone  |
voice-clone  | /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
voice-clone  |   warnings.warn('resource_tracker: There appear to be %d '

However, this is wrong as my GPU has 8GB but while monitoring its usage, it never goes above 4GB:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 54%   48C    P2             33W /  130W |    3654MiB /   8192MiB |     83%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        25      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        26      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A     15652      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
fumiama commented 1 week ago

Seems that you are using some random unofficial version of RVC. Please try this repo first. We will not solve the problem that not from this repo.

LordMilutin commented 1 week ago

No, I am using this one. I have just named my docker container voice-clone so I can more easily enter it and inspect it. Here is the output:

voice-clone  | INFO:Micy:loaded pretrained assets/pretrained_v2/f0G40k.pth
voice-clone  | INFO:Micy:<All keys matched successfully>
voice-clone  | INFO:Micy:loaded pretrained assets/pretrained_v2/f0D40k.pth
voice-clone  | INFO:Micy:<All keys matched successfully>
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | /usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:744: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
voice-clone  | grad.sizes() = [64, 1, 4], strides() = [4, 1, 1]
voice-clone  | bucket_view.sizes() = [64, 1, 4], strides() = [4, 4, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
voice-clone  |   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
voice-clone  | INFO:Micy:Train Epoch: 1 [0%]
voice-clone  | INFO:Micy:[0, 0.0001]
voice-clone  | INFO:Micy:loss_disc=3.908, loss_gen=2.790, loss_fm=18.868,loss_mel=24.102, loss_kl=9.000
voice-clone  | DEBUG:matplotlib:matplotlib data path: /usr/local/lib/python3.10/dist-packages/matplotlib/mpl-data
voice-clone  | DEBUG:matplotlib:CONFIGDIR=/root/.config/matplotlib
voice-clone  | DEBUG:matplotlib:interactive is False
voice-clone  | DEBUG:matplotlib:platform is linux
voice-clone  | Process Process-1:
voice-clone  | Traceback (most recent call last):
voice-clone  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
voice-clone  |     self.run()
voice-clone  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
voice-clone  |     self._target(*self._args, **self._kwargs)
voice-clone  |   File "/app/infer/modules/train/train.py", line 278, in run
voice-clone  |     train_and_evaluate(
voice-clone  |   File "/app/infer/modules/train/train.py", line 508, in train_and_evaluate
voice-clone  |     scaler.scale(loss_gen_all).backward()
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 525, in backward
voice-clone  |     torch.autograd.backward(
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 267, in backward
voice-clone  |     _engine_run_backward(
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
voice-clone  |     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
voice-clone  | RuntimeError: CUDA error: out of memory
voice-clone  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
voice-clone  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
voice-clone  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
voice-clone  |
voice-clone  | /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
voice-clone  |   warnings.warn('resource_tracker: There appear to be %d '
LordMilutin commented 1 week ago

Here is the GPU usage log. As you can see, it never goes above 4GB, which is very weird. I have stable diffusion container that is using 7GB without any problems and it works...

Sat Jun 29 14:10:02 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             16W /  130W |     562MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:03 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             15W /  130W |     562MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:04 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             16W /  130W |     562MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:05 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             16W /  130W |     562MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   49C    P2             26W /  130W |     791MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:07 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   50C    P2             33W /  130W |    1357MiB /   8192MiB |     11%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:08 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   50C    P2             37W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:09 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   50C    P2             38W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:10 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   50C    P2             33W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:11 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   50C    P2             32W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:12 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P5             23W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:13 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             16W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:14 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             16W /  130W |    1885MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:15 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             15W /  130W |    1887MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:16 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P8             16W /  130W |    1967MiB /   8192MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:17 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 65%   48C    P8             16W /  130W |    2135MiB /   8192MiB |     46%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:18 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 65%   49C    P2             23W /  130W |    2577MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:19 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 65%   50C    P2             35W /  130W |    2759MiB /   8192MiB |     39%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:20 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   49C    P2             34W /  130W |    3053MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:21 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 66%   48C    P3             25W /  130W |    3055MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 65%   50C    P2             34W /  130W |    3243MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:23 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 65%   50C    P2             38W /  130W |    1903MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
|    0   N/A  N/A      3828      C   /python3.10                                 N/A      |
+-----------------------------------------------------------------------------------------+
Sat Jun 29 14:10:24 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:01:00.0  On |                  N/A |
| 65%   48C    P3             31W /  130W |     562MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        35      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        38      G   /Xwayland                                   N/A      |
|    0   N/A  N/A       368      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
alexlnkp commented 1 week ago

No, I am using this one. I have just named my docker container voice-clone so I can more easily enter it and inspect it. Here is the output:

voice-clone  | INFO:Micy:loaded pretrained assets/pretrained_v2/f0G40k.pth
voice-clone  | INFO:Micy:<All keys matched successfully>
voice-clone  | INFO:Micy:loaded pretrained assets/pretrained_v2/f0D40k.pth
voice-clone  | INFO:Micy:<All keys matched successfully>
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | DEBUG:faiss.loader:Environment variable FAISS_OPT_LEVEL is not set, so let's pick the instruction set according to the current CPU
voice-clone  | INFO:faiss.loader:Loading faiss with AVX2 support.
voice-clone  | INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
voice-clone  | /usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py:744: UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
voice-clone  | grad.sizes() = [64, 1, 4], strides() = [4, 1, 1]
voice-clone  | bucket_view.sizes() = [64, 1, 4], strides() = [4, 4, 1] (Triggered internally at ../torch/csrc/distributed/c10d/reducer.cpp:325.)
voice-clone  |   return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
voice-clone  | INFO:Micy:Train Epoch: 1 [0%]
voice-clone  | INFO:Micy:[0, 0.0001]
voice-clone  | INFO:Micy:loss_disc=3.908, loss_gen=2.790, loss_fm=18.868,loss_mel=24.102, loss_kl=9.000
voice-clone  | DEBUG:matplotlib:matplotlib data path: /usr/local/lib/python3.10/dist-packages/matplotlib/mpl-data
voice-clone  | DEBUG:matplotlib:CONFIGDIR=/root/.config/matplotlib
voice-clone  | DEBUG:matplotlib:interactive is False
voice-clone  | DEBUG:matplotlib:platform is linux
voice-clone  | Process Process-1:
voice-clone  | Traceback (most recent call last):
voice-clone  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
voice-clone  |     self.run()
voice-clone  |   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
voice-clone  |     self._target(*self._args, **self._kwargs)
voice-clone  |   File "/app/infer/modules/train/train.py", line 278, in run
voice-clone  |     train_and_evaluate(
voice-clone  |   File "/app/infer/modules/train/train.py", line 508, in train_and_evaluate
voice-clone  |     scaler.scale(loss_gen_all).backward()
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 525, in backward
voice-clone  |     torch.autograd.backward(
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 267, in backward
voice-clone  |     _engine_run_backward(
voice-clone  |   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
voice-clone  |     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
voice-clone  | RuntimeError: CUDA error: out of memory
voice-clone  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
voice-clone  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
voice-clone  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
voice-clone  |
voice-clone  | /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
voice-clone  |   warnings.warn('resource_tracker: There appear to be %d '

It would appear that you need a special torch version? The torch version usually is chosen based on what your system has available (i.e., if you have CUDA available you get the torch version with CUDA support, etc) Have you tried manually cloning and building the torch python module wheel and installing it within the container? This will give more insight on whether it's a torch bug.