Problem running the speaker embedding network while training from scratch.

rishabhjain16 commented 3 years ago

Hi @caizexin ,

I have been trying to implement your work on my own dataset. I am trying to run the speaker embedding network in the deep_speaker folder using python train.py, but I keep running into this error:

Automatic speaker valification training set to a maximum of 300000 steps.
2021-02-22 20:42:02.327491: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2021-02-22 20:42:02.358786: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2021-02-22 20:42:02.446559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:17:00.0
2021-02-22 20:42:02.463437: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2021-02-22 20:42:02.476295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2021-02-22 20:42:03.162290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-22 20:42:03.170635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2021-02-22 20:42:03.179616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2021-02-22 20:42:03.188728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8616 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:17:00.0, compute capability: 7.5)
Loading checkpoint voxmyst_resnet34\gvector.ckpt-0
WARNING:tensorflow:From C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\training\saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2021-02-22 20:42:09.033503: W .\tensorflow/core/framework/model.h:213] Encountered a stop event that was not preceded by a start event.
Exiting due to exception: 2 root error(s) found.
  (0) Invalid argument: 1-th value returned by pyfunc_0 is int32, but expects int64
         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
         [[resnet/unit_2_1/sub1/bn1/AssignMovingAvg/AssignSub/_1213]]
  (1) Invalid argument: 1-th value returned by pyfunc_0 is int32, but expects int64
         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
    return fn(*args)
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: 1-th value returned by pyfunc_0 is int32, but expects int64
         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
         [[resnet/unit_2_1/sub1/bn1/AssignMovingAvg/AssignSub/_1213]]
  (1) Invalid argument: 1-th value returned by pyfunc_0 is int32, but expects int64
         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 113, in <module>
    step, _, loss, acc = sess.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
    run_metadata)
  File "C:\Users\rjain\AppData\Local\Continuum\anaconda3\envs\mstts\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: 1-th value returned by pyfunc_0 is int32, but expects int64
         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
         [[resnet/unit_2_1/sub1/bn1/AssignMovingAvg/AssignSub/_1213]]
  (1) Invalid argument: 1-th value returned by pyfunc_0 is int32, but expects int64
         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.

I have preprocessed the dataset as per the instructions mentioned in the repo. Here is an example: data_voxtest.zip

I can't really find a workaround for this. Any help is appreciated.

Thanks in advance.

rishabhjain16 commented 3 years ago

I would also appreciate if you have any documentation for training from scratch. I am sure it will also help the others who try to implement your work from the scratch. Your paper is really interesting and I am trying to reimplement it with my own dataset. So any help is really appreciated for reimplementing your work from scratch.

caizexin commented 3 years ago

Hi Rishabh, I assume you are running the code on a Windows system. The error is raised because Windows and Linux system have different default integer type. I think the answer from this link may help you. However, it's strange that the process raise an error on node unit_2_1 while pass other nodes like (unit_0_0, 1_0), and I did not see any return integer here or in the feed dictionary.

My advice for this issue would be

I suspect that the main problem is not from the line specified in the error log. Try to assign x before or after this line to another variable and run a session that fetch the value of this variable to see if the problem is caused by any specific node. It may be a general issue that is difficult to identify which function or assignment causes the error.
Since I do not have a windows machine for debugging, I would recommend you to train on ubuntu or other linux machines if you have one.

rishabhjain16 commented 3 years ago

Hi @caizexin,

Thank you so much for your help and getting back to me.

Hi Rishabh, I assume you are running the code on a Windows system. The error is raised because Windows and Linux system have different default integer type. I think the answer from this link may help you. However, it's strange that the process raise an error on node unit_2_1 while pass other nodes like (unit_0_0, 1_0), and I did not see any return integer here or in the feed dictionary.

Yes, I am using a windows machine at the moment. I don't have a Linux machine, but I think I can try installing docker and find a workaround for that. I did read that link before that you have mentioned above and tried changing the integer type in the code itself but that didn't work (Just to give you an update).

My advice for this issue would be

I suspect that the main problem is not from the line specified in the error log. Try to assign x before or after this line to another variable and run a session that fetch the value of this variable to see if the problem is caused by any specific node. It may be a general issue that is difficult to identify which function or assignment causes the error.

I didn't quite understand what you mean here by assigning x to another variable and run session to fetch the value of this variable. I am new to tensorflow so it would be great if you can give me an example on how to do this?

Do you mean something like this?

x = tf.Variable([1.0, 2.0])
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    v = sess.run(x)
    print(v)

Another thing is that the line you have mentioned above belongs to the Resnet.py in the feedback_synthesizer/models/embedding/Resnet.py but what I am trying to run is the speaker embedding model in deep_speaker folder which contains a different resnet.py. I can see that both the directory contains similar files and folder but they do have a difference in resnet.py and Resnet.py. So shouldn't I modify the resnet.py in deep_speaker folder? Not sure if they are connected somehow.

Since I do not have a windows machine for debugging, I would recommend you to train on ubuntu or other Linux machines if you have one.

Yes, I will try to prepare a Linux machine and will try running my experiment again on that as well to see if it works.

rishabhjain16 commented 3 years ago

Just to give you an update, I started my encoder training again using Docker Ubuntu machine and my model seems to be working on it. I still have some doubts though. Let's hope it works fine for now.

caizexin commented 3 years ago

Great. Good to hear that things are getting better. Just to answer your previous question regrading variable assignment and debugging. What I meant is that we can assign the x after that line in resnet.py (sorry I was referring to the wrong Resnet.py last time) to a variable named self.tmp_var.

self.tmp_var = x

Then in the training code, instead of fetching our results with fetches = [train_resnet.global_step, train_resnet.train_op, train_resnet.cost, train_resnet.accuracy], you can use fetches = [train_resnet.tmp_var] to get the x value.

Good luck.

rishabhjain16 commented 3 years ago

Got it. Thanks a ton. I will try it out over the windows machine to see if I coud figure out the problem.

I have another query which I have been trying to figure out. So I think dataset that I am using might contain some empty folders or folders with voices having small duration. I am not so sure about that. That's what I inferred from the error I am getting. I have tried a few things I found online but couldn't get my head around it. So my embedding model runs for around 1000 steps (approx.) and gives the following error.

W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

2021-03-03 11:54:54.504139: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

2021-03-03 11:54:54.561372: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

Exiting due to exception: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

         [[{{node PyFunc}}]]
         [[IteratorGetNext]]
Traceback (most recent call last):
2021-03-03 11:55:01.787093: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

2021-03-03 11:55:01.788145: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
2021-03-03 11:55:01.793062: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

         [[{{node PyFunc}}]]
         [[IteratorGetNext]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 113, in <module>
    step, _, loss, acc = sess.run(fetches=fetches, feed_dict=feed_dict)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):

  File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
    ret = func(*args)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
    fbanks = self._process_wave(wav_file.decode(), num_frames)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
    wav = audio.trim_silence(wav, audio_hparams)

  File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
    return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
    y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db

  File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
    mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2

  File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
    y = np.pad(y, int(frame_length // 2), mode=pad_mode)

  File "<__array_function__ internals>", line 6, in pad

  File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
    "'constant' or 'empty'".format(axis)

ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

         [[{{node PyFunc}}]]
         [[IteratorGetNext]]

What do you think I should do? Or what could be the reason for this error? Thanks in advance.

caizexin commented 3 years ago

I would recommend you to check your data first. It should not take too long. For this specific issue, you can just simply use a for loop to load all your data and call audio.trim_silence(wav, audio_hparams). Then you can know which file cause the problem, look at that wav file and address the issue.

rishabhjain16 commented 3 years ago

Okay. I will do that. Thanks

caizexin / tf_multispeakerTTS_fc

Problem running the speaker embedding network while training from scratch. #3