[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

BrukArkady commented 2 years ago

Describe the bug

I'm trying to run Tacotron2 training, but receives RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

To Reproduce

CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py

Expected behavior

No response

Logs

admin@8f7837b57ed6:~/TTS$ CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60.0
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60.0
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 | > Found 9039 files in /home/admin/M-AI-Labs/resampled_to_22050/by_book/male/minaev/oblomov
 > Using CUDA: True
 > Number of GPUs: 1

 > Model has 47669492 parameters

 > Number of output frames: 6

 > EPOCH: 0/1000
 --> /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1

> DataLoader initialization
| > Tokenizer:
        | > add_blank: False
        | > use_eos_bos: False
        | > use_phonemes: True
        | > phonemizer:
                | > phoneme language: ru-ru
                | > phoneme backend: gruut
| > Number of instances : 8949
 | > Preprocessing samples
 | > Max text length: 216
 | > Min text length: 3
 | > Avg text length: 99.18292546653258
 |
 | > Max audio length: 583682.0
 | > Min audio length: 26014.0
 | > Avg audio length: 182216.04805006145
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.

 > TRAINING (2022-08-02 11:05:38)
/home/admin/TTS/TTS/tts/models/tacotron2.py:331: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  alignment_lengths = (
 ! Run is removed from /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1534, in fit
    self._fit()
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1518, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1283, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1115, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 999, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 955, in _model_train_step
    return model.train_step(*input_args)
  File "/home/admin/TTS/TTS/tts/models/tacotron2.py", line 339, in train_step
    loss_dict = criterion(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/admin/TTS/TTS/tts/layers/losses.py", line 440, in forward
    self.criterion_st(stopnet_output, stopnet_target, stop_target_length)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/admin/TTS/TTS/tts/layers/losses.py", line 193, in forward
    loss = functional.binary_cross_entropy_with_logits(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 3150, in binary_cross_entropy_with_logits
    return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 2080 Ti",
            "NVIDIA GeForce RTX 2080 Ti"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.0+cu102",
        "TTS": "0.7.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.8.10",
        "version": "#36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021"
    }
}

Additional context

No response

p0p4k commented 2 years ago

Check the device type of the tensors at the error line. Move them all to either GPU or cpu and run.

arif334 commented 2 years ago

Check the device type of the tensors at the error line. Move them all to either GPU or cpu and run.

@p0p4k Can you elaborate a bit, please? I'm having the same problem.

arif334 commented 2 years ago

@BrukArkady The answer is given by @p0p4k. To be precise, you should change line #3150 of functional.py from:

return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

to:

return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)

p0p4k commented 2 years ago

For precise analysis of the error, append the following code, you can see the tensors' respective devices. In the TTS/TTS/tts/layers/losses.py file,

#add immediately above L193
tensors_to_check = [x.masked_select(mask), target.masked_select(mask), self.pos_weight] 
for t in tensors_to_check:
    try:
        print(f'tensor {t} is on GPU device - {t.get_device()}
    except:
        print(f'tensor {t} is on cpu'}

#add immediately above L197
tensors_to_check = [x, target, pos_weight=self.pos_weight] 
for t in tensors_to_check:
    try:
        print(f'tensor {t} is on GPU device - {t.get_device()}
    except:
        print(f'tensor {t} is on cpu'}

Then you can just add tensor.cuda() to change a tensor's device to GPU. You can do this directly without doing the above step as well.

p0p4k commented 2 years ago

@BrukArkady The answer is given by @p0p4k. To be precise, you should change line #3150 of functional.py from:
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)

Would prefer doing the changes in TTS file, rather than Pytorch library.

arif334 commented 2 years ago

@BrukArkady The answer is given by @p0p4k. To be precise, you should change line #3150 of functional.py from:
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)
Would prefer doing the changes in TTS file, rather than Pytorch library.

Got it. Thanks.

p0p4k commented 2 years ago

@BrukArkady The answer is given by @p0p4k. To be precise, you should change line #3150 of functional.py from:
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)
Would prefer doing the changes in TTS file, rather than Pytorch library.
Got it. Thanks.

Then again one issue is if you are using a machine with no GPU, we have to write if-else for that. Can you make a PR?

arif334 commented 2 years ago

Then again one issue is if you are using a machine with no GPU, we have to write if-else for that. Can you make a PR?

I'm afraid I can't right now. I'll look into it once I get some spare time.

snufas commented 2 years ago

same exact issue here... can someone comment on this issue when it is fixed, so i can try again... Thanks

erogol commented 2 years ago

Fixed by #1872

coqui-ai / TTS