TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.8k stars 810 forks source link

"Invalid argument: required broadcastable shapes" when trying to train FS2 #732

Closed godspirit00 closed 2 years ago

godspirit00 commented 2 years ago

I was trying to train Fastspeech 2 on the Nancy Corpus. I extracted the durations with MFA, and did preprocessing as described in the README. But when I start training, I met the following error:

2022-01-06 15:31:18.739332: W tensorflow/core/framework/op_kernel.cc:1680] Invalid argument: required broadcastable shapes
Traceback (most recent call last):
  File "examples/fastspeech2/train_fastspeech2.py", line 417, in <module>
    main()
  File "examples/fastspeech2/train_fastspeech2.py", line 405, in main
    trainer.fit(
  File "/home/tony/Documents/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 1010, in fit
    self.run()
  File "/home/tony/Documents/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 104, in run
    self._train_epoch()
  File "/home/tony/Documents/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 126, in _train_epoch
    self._train_step(batch)
  File "/home/tony/Documents/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 782, in _train_step
    self.one_step_forward(batch)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3039, in __call__
    return graph_function._call_flat(
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1963, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  required broadcastable shapes
     [[node tf_fast_speech2/add_1 (defined at /home/tony/Documents/TensorFlowTTS/tensorflow_tts/models/fastspeech2.py:185) ]]
     [[tf_fast_speech2/length_regulator/while/loop_body_control/_117/_109]]
  (1) Invalid argument:  required broadcastable shapes
     [[node tf_fast_speech2/add_1 (defined at /home/tony/Documents/TensorFlowTTS/tensorflow_tts/models/fastspeech2.py:185) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__one_step_forward_27336]

Errors may have originated from an input operation.
Input Source operations connected to node tf_fast_speech2/add_1:
 tf_fast_speech2/encoder/layer_._2/mul (defined at /home/tony/Documents/TensorFlowTTS/tensorflow_tts/models/fastspeech.py:413)

Input Source operations connected to node tf_fast_speech2/add_1:
 tf_fast_speech2/encoder/layer_._2/mul (defined at /home/tony/Documents/TensorFlowTTS/tensorflow_tts/models/fastspeech.py:413)

Function call stack:
_one_step_forward -> _one_step_forward

This is exactly the same issue as #672 . That issue was labeled as "Bug" and it looks like it was not solved?

So what can I do to get the training started?

Thanks a lot!

dathudeptrai commented 2 years ago

This bug comes from data, not a model itself. I think you need to check what samples cause this problem by the code below:

for data in tqdm(train_dataloader):
    o = fastspeech(**data, training=True)

Note that you need to set batch_size = 1 so you can easily check what is a real problem with those samples :D

godspirit00 commented 2 years ago

@dathudeptrai Sorry for the late reply.
I inserted the code you provided in train_fastspeech2.py after Line 366, and print(o). The output is as follows:

{'utt_ids': <tf.Tensor: shape=(16,), dtype=string, numpy=
array([b'HIPS-0848-08', b'SLAT297-004-03', b'SCIENCE-15354-01',
       b'NYT031-040-01', b'TIM_918', b'TIM_663', b'HIPS-0544-01',
       b'TIM_960', b'WAOPF-0323-03', b'ARC_087', b'WKRWTA-0320-00',
       b'RURAL-04942', b'TIM_366', b'YOYT-0106-01', b'SCIENCE-14201',
       b'SLAT059-004-04'], dtype=object)>, 'input_ids': <tf.Tensor: shape=(16, 129), dtype=int32, numpy=
array([[60, 46, 57, ...,  0,  0,  0],
       [60, 42, 11, ...,  0,  0,  0],
       [56, 52, 11, ...,  0,  0,  0],
       ...,
       [52, 49, 41, ...,  0,  0,  0],
       [57, 52, 11, ...,  0,  0,  0],
       [57, 45, 42, ...,  0,  0,  0]], dtype=int32)>, 'speaker_ids': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)>, 'duration_gts': <tf.Tensor: shape=(16, 92), dtype=int32, numpy=
array([[15,  7,  4, ...,  0,  0,  0],
       [14,  6,  5, ...,  0,  0,  0],
       [ 3, 16,  4, ...,  0,  0,  0],
       ...,
       [10,  8,  5, ...,  0,  0,  0],
       [ 8,  7,  3, ...,  0,  0,  0],
       [ 9,  5,  6, ...,  0,  0,  0]], dtype=int32)>, 'f0_gts': <tf.Tensor: shape=(16, 92), dtype=float32, numpy=
array([[ 0.        ,  0.8246144 ,  1.2876985 , ...,  0.        ,
         0.        ,  0.        ],
       [-0.5796331 ,  0.        ,  1.1341187 , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  1.0941867 ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [ 0.        ,  0.449585  ,  0.8970453 , ...,  0.        ,
         0.        ,  0.        ],
       [-0.34129396, -0.614922  , -0.27131552, ...,  0.        ,
         0.        ,  0.        ],
       [ 0.45157978,  0.        ,  0.76424146, ...,  0.        ,
         0.        ,  0.        ]], dtype=float32)>, 'energy_gts': <tf.Tensor: shape=(16, 92), dtype=float32, numpy=
array([[-1.2123926 , -0.6892038 ,  0.8126763 , ...,  0.        ,
         0.        ,  0.        ],
       [-0.354727  ,  0.01240734,  1.5514941 , ...,  0.        ,
         0.        ,  0.        ],
       [-1.2133728 ,  0.10915235, -0.7754631 , ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [-1.2078284 ,  0.4109656 ,  1.9860119 , ...,  0.        ,
         0.        ,  0.        ],
       [-1.0766432 ,  0.08155485,  0.65240884, ...,  0.        ,
         0.        ,  0.        ],
       [-0.27268156, -0.24169774,  2.7491963 , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32)>, 'mel_gts': <tf.Tensor: shape=(16, 633, 80), dtype=float32, numpy=
array([[[-1.2746664 , -1.7843498 , -2.2586834 , ..., -1.2931284 ,
         -1.3889788 , -1.3783925 ],
        [-1.0166491 , -1.6397812 , -2.181965  , ..., -1.204765  ,
         -1.1647713 , -1.1934797 ],
        [-0.86630285, -1.4328362 , -1.5542951 , ..., -1.2119982 ,
         -1.2231145 , -1.2497594 ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]],

       [[-2.3355176 , -2.289797  , -2.6863523 , ..., -1.8296297 ,
         -2.0163133 , -1.9705946 ],
        [-2.2332492 , -2.1360862 , -2.3810003 , ..., -1.7833395 ,
         -1.8442103 , -1.9748284 ],
        [-0.48657155, -0.52131754, -0.72493863, ..., -1.6943804 ,
         -1.8081826 , -1.9736167 ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]],

       [[-1.8014201 , -2.0903645 , -2.1711755 , ..., -1.5921981 ,
         -1.4874189 , -1.3082155 ],
        [-1.6379825 , -1.8543022 , -2.1112792 , ..., -1.4420213 ,
         -1.4051898 , -1.2607079 ],
        [-1.2559214 , -1.7089903 , -1.9510512 , ..., -1.334813  ,
         -1.3849427 , -1.3038315 ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]],

       ...,

       [[-1.4703877 , -1.5131235 , -1.4239932 , ..., -1.3939542 ,
         -1.400334  , -1.6912216 ],
        [-1.3440789 , -1.6951889 , -1.3633583 , ..., -1.4433973 ,
         -1.4994305 , -1.5762591 ],
        [-1.5572135 , -1.6220838 , -1.5402904 , ..., -1.513843  ,
         -1.589731  , -1.5659026 ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]],

       [[-2.2110486 , -2.533403  , -2.4404142 , ..., -2.0414464 ,
         -1.8681637 , -1.7730412 ],
        [-2.250254  , -2.4695659 , -2.6156392 , ..., -1.9038972 ,
         -1.6976963 , -1.8304156 ],
        [-1.721149  , -1.5887836 , -1.5241948 , ..., -1.3508788 ,
         -1.4653685 , -1.406599  ],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]],

       [[-2.6205416 , -2.9442687 , -2.5861807 , ..., -1.725173  ,
         -2.025198  , -1.8375293 ],
        [-0.2048404 , -0.3904849 , -0.5025313 , ...,  0.43135187,
          0.47107655, -0.25049213],
        [ 0.48289043,  0.17719266,  0.39419013, ...,  1.0222603 ,
          1.0808944 ,  0.29906785],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]]], dtype=float32)>, 'mel_lengths': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([477, 163, 223, 597, 141, 203, 633, 345, 283, 227, 489, 555, 283,
       261, 553, 439], dtype=int32)>}
2022-01-10 19:05:12.660309: W tensorflow/core/framework/op_kernel.cc:1680] Invalid argument: required broadcastable shapes
0it [00:52, ?it/s]
Traceback (most recent call last):
  File "examples/fastspeech2/train_fastspeech2.py", line 424, in <module>
    main()
  File "examples/fastspeech2/train_fastspeech2.py", line 370, in main
    o = fastspeech(**data, training=True)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1037, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/tony/Documents/TensorFlowTTS/tensorflow_tts/models/fastspeech2.py", line 185, in call
    last_encoder_hidden_states += f0_embedding + energy_embedding
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1367, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
    return target(*args, **kwargs)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py", line 1700, in _add_dispatch
    return gen_math_ops.add_v2(x, y, name=name)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/ops/gen_math_ops.py", line 455, in add_v2
    _ops.raise_from_not_ok_status(e, name)
  File "/home/tony/.virtualenvs/tftts/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6941, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: required broadcastable shapes [Op:AddV2]

So how do I solve the problem?
Thanks a lot!

Tian14267 commented 2 years ago

@godspirit00 Hellow, Did you solve this problem ? I get same error ,but I don't know how to solve it.

godspirit00 commented 2 years ago

@Tian14267 not yet. I'm still waiting for @dathudeptrai 's reply.

wxyhv commented 2 years ago

Have you solved this problem?@godspirit00 @dathudeptrai

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.