NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

TypeError: failed to train tacotron model for chinese dataset #346

Closed wujsy closed 5 years ago

wujsy commented 5 years ago

Hi, I used tacotron model to train chinese dataset, but type errors happened: tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: magphase() got an unexpected keyword argument 'power' Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in call ret = func(*args)

File "/data/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/text2speech.py", line 442, in _parse_audio_transcript_element mel_basis=self._mel_basis

File "/data/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/speech_utils.py", line 87, in get_speech_features_from_file hop_length, mag_power, feature_normalize, mean, std, data_min, mel_basis

File "/data/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/speech_utils.py", line 147, in get_speechfeatures mag, = librosa.magphase(complex_spec, power=mag_power)

TypeError: magphase() got an unexpected keyword argument 'power'

 [[{{node PyFuncStateless}} = PyFuncStateless[Tin=[DT_STRING], Tout=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32], token="pyfunc_0", _device="/device:GPU:0"](arg0)]]
 [[node IteratorGetNext (defined at /data/wujiaxing/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/text2speech.py:338)  = IteratorGetNext[output_shapes=[[?,?], [?,1], [?,?,593], [?,?], [?,1]], output_types=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]
 [[{{node ForwardPass/tacotron_2_decoder/decoder_1/stop_token_proj/Tensordot/GatherV2/_1805}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_1888_...t/GatherV2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

what's the problem?

borisgin commented 5 years ago

Which librosa version do you have?

wujsy commented 5 years ago

@borisgin Hi, thaks for your response, my librosa version is 0.5.1

wujsy commented 5 years ago

Hi, borisgin, I updated the librosa version to 0.6.1, there is no TypeError, but the new problem has arisen: tensorflow.python.framework.errors_impl.UnknownError: AssertionError: num_features for spectrogram should be <= (fs * window_size // 2 + 1) Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in call ret = func(*args)

File "/data/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/text2speech.py", line 442, in _parse_audio_transcript_element mel_basis=self._mel_basis

File "/data/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/speech_utils.py", line 87, in get_speech_features_from_file hop_length, mag_power, feature_normalize, mean, std, data_min, mel_basis

File "/data/workspace/OpenSeq2Seq/open_seq2seq/data/text2speech/speech_utils.py", line 152, in get_speech_features "num_features for spectrogram should be <= (fs * window_size // 2 + 1)"

AssertionError: num_features for spectrogram should be <= (fs * window_size // 2 + 1)

Is it the problem about my dataset or else? my csv file like this: BAC009S0724W0121|广州市房地产中介协会分析|广州市房地产中介协会分析 BAC009S0724W0122|广州市房地产中介协会还表示|广州市房地产中介协会还表示 BAC009S0724W0123|相比于其他一线城市|相比于其他一线城市 BAC009S0724W0124|广州二手住宅市场表现一直相对稳健|广州二手住宅市场表现一直相对稳健 BAC009S0724W0125|而在股市大幅震荡的环境下|而在股市大幅震荡的环境下

borisgin commented 5 years ago

Did you try to reduce the number of features in the spectrogram?

wujsy commented 5 years ago

yes, the problem was solved by reducing the number of features, thanks @ borisgin