fastspeech2 multiple GPU training error

MostafaAlaviyan commented 2 years ago

hi thanks for the valuable implementation. I trained fastspeech2 on LJSpeech like datasets. I extract duration from MFA and train the model with a single GPU. the model train very well. when I want to train the model with two or multiple GPUs, the training process begins, but the below error happens:

[train]:   0%|▏                                                                            | 800/250000 [06:45<12:49:42,  5.40it/s]2022-02-08 11:09:02,315 (base_trainer:978) INFO: (Step: 800) train_duration_loss = 0.2961.
2022-02-08 11:09:02,318 (base_trainer:978) INFO: (Step: 800) train_f0_loss = 0.6272.
2022-02-08 11:09:02,320 (base_trainer:978) INFO: (Step: 800) train_energy_loss = 0.6370.
2022-02-08 11:09:02,323 (base_trainer:978) INFO: (Step: 800) train_mel_loss_before = 0.3968.
2022-02-08 11:09:02,325 (base_trainer:978) INFO: (Step: 800) train_mel_loss_after = 0.4355.
[train]:   0%|▎                                                                            | 857/250000 [06:56<11:03:22,  6.26it/s]2022-02-08 

11:09:13.387314: F ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:828] Non-OK-status: GpuLaunchKernel( 

ColumnReduceKernel<IN_T, T*, Op>, grid_dim, block_dim, 0, cu_stream, in, (T*)temp_storage.flat<int8_t>().data(), extent_x, 

extent_y, op, init) status: Internal: an illegal memory access was encountered
Aborted (core dumped)

have any idea?

MostafaAlaviyan commented 2 years ago

@ZDisket @dathudeptrai

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

binbinxue commented 2 years ago

i checked the preprocessing scripts, the author used g2p_en package to convert text to phonemes, MFA however produces the phonemes differently even for the same language. have you checked the consistency of the phoneme outputs? The illegal memory access could be due to discrepancy in phonemes

TensorSpeech / TensorFlowTTS

fastspeech2 multiple GPU training error #743