NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

shape '[1, 1, 108607]' is invalid for input of size 217214 #134

Closed azman-i closed 2 years ago

azman-i commented 2 years ago

Hi,i am trying to run flowtron on bangla TTS dataset.i have changed symbols.py(bangla symbols included),cmudict.py(bangla word and phoneme dict) and config.json(pointed to bangla dataset).Now i am facing this error.What can be the possible cause for this error? After running:python train.py -c config.json -p train_config.output_directory=outdir data_config.use_attn_prior=1

Epoch:` 0
/home/azman/texttospeech/Flowtron/flowtron/data.py:56: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
Traceback (most recent call last):
  File "train.py", line 415, in <module>
    train(n_gpus, rank, **train_config)
  File "train.py", line 281, in train
    for batch in train_loader:
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/azman/miniconda3/envs/ming_tts/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/azman/texttospeech/Flowtron/flowtron/data.py", line 177, in __getitem__
    mel = self.get_mel(audio)
  File "/home/azman/texttospeech/Flowtron/flowtron/data.py", line 153, in get_mel
    melspec = self.stft.mel_spectrogram(audio_norm)
  File "/home/azman/texttospeech/Flowtron/flowtron/audio_processing.py", line 130, in mel_spectrogram
    magnitudes, phases = self.stft_fn.transform(y)
  File "/home/azman/texttospeech/Flowtron/flowtron/audio_processing.py", line 214, in transform
    input_data = input_data.view(num_batches, 1, num_samples)
RuntimeError: shape '[1, 1, 108607]' is invalid for input of size 217214
letrongan commented 2 years ago

Can you share your config file ? @azman63

azman-i commented 2 years ago

i have found the solution.You have to change your audio file channel to mono from stereo incase you have some files with stereo channel. https://stackoverflow.com/questions/5120555/how-can-i-convert-a-wav-from-stereo-to-mono-in-python