CUDA error: an illegal memory access was encountered while training a model (RVC V2)

I get this error constantly when training a new model with a batch size of 3 or more. Initially, I started training with the default Batch size of 4, but it started crashing very easily. When it encounters the problem my screen gets black for a couple of seconds and then the cmd gives out the error, saying:

RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I also noticed that my training crashed if left for a long time (batch size was set to 4) and I simply had to move the mouse a bit to make it error out instantly.

My laptop hardware: CPU: Intel Core i7-11800H RAM: 32 GB GPU: RTX 3070 8GB Dataset length (single file): 53 minutes Dataset format: WAV

I think this time it might have crashed because I was using my laptop in the background, doing simple chatting/monitoring Tensorboard graphs and watching RVC YT tutorials. However, it's not an out of memory error, as others have previously encountered.

The log:

write filelist done
use gpus: 0
runtime\python.exe train_nsf_sim_cache_sid_load_pretrain.py -e Den4ikSeekersV3 -sr 40k -f0 1 -bs 3 -g 0 -te 50 -se 5 -pg pretrained_v2/f0G40k.pth -pd pretrained_v2/f0D40k.pth -l 1 -c 0 -sw 1 -v v2 -li 279
INFO:Den4ikSeekersV3:{'train': {'log_interval': 279, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 3, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\\Den4ikSeekersV3/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1,3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\\Den4ikSeekersV3', 'experiment_dir': './logs\\Den4ikSeekersV3', 'save_every_epoch': 5, 'name': 'Den4ikSeekersV3', 'total_epoch': 50, 'pretrainG': 'pretrained_v2/f0G40k.pth', 'pretrainD': 'pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 1, 'save_every_weights': '1', 'if_cache_data_in_gpu': 0}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
gin_channels: 256 self.spk_embed_dim: 109
INFO:Den4ikSeekersV3:loaded pretrained pretrained_v2/f0G40k.pth
<All keys matched successfully>
INFO:Den4ikSeekersV3:loaded pretrained pretrained_v2/f0D40k.pth
<All keys matched successfully>
D:\INSTALLS\RVC\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\INSTALLS\RVC\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\INSTALLS\RVC\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\INSTALLS\RVC\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
D:\INSTALLS\RVC\runtime\lib\site-packages\torch\functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\SpectralOps.cpp:867.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
D:\INSTALLS\RVC\runtime\lib\site-packages\torch\autograd\__init__.py:200: UserWarning: Grad strides do not match bucket view strides. This may indicategrad was not created according to the gradient layout contract, or that theparam's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [64, 1, 4], strides() = [4, 1, 1]
bucket_view.sizes() = [64, 1, 4], strides() = [4, 4, 1] (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\reducer.cpp:337.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
INFO:Den4ikSeekersV3:Train Epoch: 1 [0%]
INFO:Den4ikSeekersV3:[0, 0.0001]
INFO:Den4ikSeekersV3:loss_disc=3.713, loss_gen=3.484, loss_fm=11.058,loss_mel=25.817, loss_kl=5.464
DEBUG:matplotlib:matplotlib data path: D:\INSTALLS\RVC\runtime\lib\site-packages\matplotlib\mpl-data
DEBUG:matplotlib:CONFIGDIR=C:\Users\Omem\.matplotlib
DEBUG:matplotlib:interactive is False
DEBUG:matplotlib:platform is win32
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
max value is  tensor(1.0843)
INFO:Den4ikSeekersV3:====> Epoch: 1 [2024-03-12 18:49:07] | (0:02:22.698938)
INFO:Den4ikSeekersV3:Train Epoch: 2 [2%]
INFO:Den4ikSeekersV3:[279, 9.99875e-05]
INFO:Den4ikSeekersV3:loss_disc=4.333, loss_gen=3.062, loss_fm=5.908,loss_mel=18.143, loss_kl=1.504
INFO:Den4ikSeekersV3:====> Epoch: 2 [2024-03-12 18:51:24] | (0:02:17.632440)
INFO:Den4ikSeekersV3:Train Epoch: 3 [4%]
INFO:Den4ikSeekersV3:[558, 9.99750015625e-05]
INFO:Den4ikSeekersV3:loss_disc=3.556, loss_gen=3.042, loss_fm=9.422,loss_mel=20.361, loss_kl=1.660
INFO:Den4ikSeekersV3:====> Epoch: 3 [2024-03-12 18:53:37] | (0:02:13.120967)
INFO:Den4ikSeekersV3:Train Epoch: 4 [5%]
INFO:Den4ikSeekersV3:[837, 9.996250468730469e-05]
INFO:Den4ikSeekersV3:loss_disc=4.042, loss_gen=3.086, loss_fm=10.286,loss_mel=17.957, loss_kl=1.465
INFO:Den4ikSeekersV3:====> Epoch: 4 [2024-03-12 18:55:53] | (0:02:15.803684)
INFO:Den4ikSeekersV3:Train Epoch: 5 [7%]
INFO:Den4ikSeekersV3:[1116, 9.995000937421877e-05]
INFO:Den4ikSeekersV3:loss_disc=4.149, loss_gen=2.704, loss_fm=9.035,loss_mel=18.226, loss_kl=1.469
INFO:Den4ikSeekersV3:Saving model and optimizer state at epoch 5 to ./logs\Den4ikSeekersV3\G_2333333.pth
INFO:Den4ikSeekersV3:Saving model and optimizer state at epoch 5 to ./logs\Den4ikSeekersV3\D_2333333.pth
INFO:Den4ikSeekersV3:saving ckpt Den4ikSeekersV3_e5:Success.
INFO:Den4ikSeekersV3:====> Epoch: 5 [2024-03-12 18:58:08] | (0:02:14.966068)
INFO:Den4ikSeekersV3:Train Epoch: 6 [9%]
INFO:Den4ikSeekersV3:[1395, 9.993751562304699e-05]
INFO:Den4ikSeekersV3:loss_disc=4.231, loss_gen=3.099, loss_fm=10.232,loss_mel=19.459, loss_kl=1.666
INFO:Den4ikSeekersV3:====> Epoch: 6 [2024-03-12 19:00:21] | (0:02:12.955729)
INFO:Den4ikSeekersV3:Train Epoch: 7 [11%]
INFO:Den4ikSeekersV3:[1674, 9.99250234335941e-05]
INFO:Den4ikSeekersV3:loss_disc=4.177, loss_gen=3.267, loss_fm=10.140,loss_mel=19.995, loss_kl=1.684
INFO:Den4ikSeekersV3:====> Epoch: 7 [2024-03-12 19:02:31] | (0:02:09.405485)
INFO:Den4ikSeekersV3:Train Epoch: 8 [13%]
INFO:Den4ikSeekersV3:[1953, 9.991253280566489e-05]
INFO:Den4ikSeekersV3:loss_disc=3.863, loss_gen=2.904, loss_fm=12.747,loss_mel=21.048, loss_kl=1.518
Process Process-1:
Traceback (most recent call last):
  File "multiprocessing\process.py", line 315, in _bootstrap
  File "multiprocessing\process.py", line 108, in run
  File "D:\INSTALLS\RVC\train_nsf_sim_cache_sid_load_pretrain.py", line 225, in run
    train_and_evaluate(
  File "D:\INSTALLS\RVC\train_nsf_sim_cache_sid_load_pretrain.py", line 461, in train_and_evaluate
    scaler.step(optim_g)
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 370, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\cuda\amp\grad_scaler.py", line 290, in _maybe_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\optim\lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\optim\optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\optim\optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\optim\adamw.py", line 171, in step
    adamw(
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\optim\adamw.py", line 321, in adamw
    func(
  File "D:\INSTALLS\RVC\runtime\lib\site-packages\torch\optim\adamw.py", line 564, in _multi_tensor_adamw
    exp_avg_sq_sqrt = torch._foreach_sqrt(device_exp_avg_sqs)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I'm new with RVC training, so any help would be greatly appreciated! Thanks :3

Mangio621 / Mangio-RVC-Fork

CUDA error: an illegal memory access was encountered while training a model (RVC V2) #212