Open rishabhjain16 opened 3 years ago
I would also appreciate if you have any documentation for training from scratch. I am sure it will also help the others who try to implement your work from the scratch. Your paper is really interesting and I am trying to reimplement it with my own dataset. So any help is really appreciated for reimplementing your work from scratch.
Hi Rishabh, I assume you are running the code on a Windows system. The error is raised because Windows and Linux system have different default integer type. I think the answer from this link may help you. However, it's strange that the process raise an error on node unit_2_1 while pass other nodes like (unit_0_0, 1_0), and I did not see any return integer here or in the feed dictionary.
My advice for this issue would be
Hi @caizexin,
Thank you so much for your help and getting back to me.
Hi Rishabh, I assume you are running the code on a Windows system. The error is raised because Windows and Linux system have different default integer type. I think the answer from this link may help you. However, it's strange that the process raise an error on node unit_2_1 while pass other nodes like (unit_0_0, 1_0), and I did not see any return integer here or in the feed dictionary.
Yes, I am using a windows machine at the moment. I don't have a Linux machine, but I think I can try installing docker and find a workaround for that. I did read that link before that you have mentioned above and tried changing the integer type in the code itself but that didn't work (Just to give you an update).
My advice for this issue would be
- I suspect that the main problem is not from the line specified in the error log. Try to assign x before or after this line to another variable and run a session that fetch the value of this variable to see if the problem is caused by any specific node. It may be a general issue that is difficult to identify which function or assignment causes the error.
I didn't quite understand what you mean here by assigning x to another variable and run session to fetch the value of this variable. I am new to tensorflow so it would be great if you can give me an example on how to do this?
Do you mean something like this?
x = tf.Variable([1.0, 2.0])
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
v = sess.run(x)
print(v)
Another thing is that the line you have mentioned above belongs to the Resnet.py in the feedback_synthesizer/models/embedding/Resnet.py but what I am trying to run is the speaker embedding model in deep_speaker folder which contains a different resnet.py. I can see that both the directory contains similar files and folder but they do have a difference in resnet.py and Resnet.py. So shouldn't I modify the resnet.py in deep_speaker folder? Not sure if they are connected somehow.
- Since I do not have a windows machine for debugging, I would recommend you to train on ubuntu or other Linux machines if you have one.
Yes, I will try to prepare a Linux machine and will try running my experiment again on that as well to see if it works.
Just to give you an update, I started my encoder training again using Docker Ubuntu machine and my model seems to be working on it. I still have some doubts though. Let's hope it works fine for now.
Great. Good to hear that things are getting better. Just to answer your previous question regrading variable assignment and debugging. What I meant is that we can assign the x after that line in resnet.py (sorry I was referring to the wrong Resnet.py last time) to a variable named self.tmp_var.
self.tmp_var = x
Then in the training code, instead of fetching our results with fetches = [train_resnet.global_step, train_resnet.train_op, train_resnet.cost, train_resnet.accuracy]
, you can use fetches = [train_resnet.tmp_var]
to get the x value.
Good luck.
Got it. Thanks a ton. I will try it out over the windows machine to see if I coud figure out the problem.
I have another query which I have been trying to figure out. So I think dataset that I am using might contain some empty folders or folders with voices having small duration. I am not so sure about that. That's what I inferred from the error I am getting. I have tried a few things I found online but couldn't get my head around it. So my embedding model runs for around 1000 steps (approx.) and gives the following error.
W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
2021-03-03 11:54:54.504139: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
2021-03-03 11:54:54.561372: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Exiting due to exception: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
[[{{node PyFunc}}]]
[[IteratorGetNext]]
Traceback (most recent call last):
2021-03-03 11:55:01.787093: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
2021-03-03 11:55:01.788145: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
2021-03-03 11:55:01.793062: W tensorflow/core/framework/op_kernel.cc:1490] Invalid argument: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
[[{{node PyFunc}}]]
[[IteratorGetNext]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 113, in <module>
step, _, loss, acc = sess.run(fetches=fetches, feed_dict=feed_dict)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 209, in __call__
ret = func(*args)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 49, in _parse_func
fbanks = self._process_wave(wav_file.decode(), num_frames)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/feeder_wav.py", line 30, in _process_wave
wav = audio.trim_silence(wav, audio_hparams)
File "/home/rjain/TTS_Exp_and_Data/tf_multispeakerTTS_fc/deep_speaker/datasets/audio.py", line 36, in trim_silence
return librosa.effects.trim(wav, top_db= hparams.trim_top_db, frame_length=hparams.trim_fft_size, hop_length=hparams.trim_hop_size)[0]
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 498, in trim
y, frame_length=frame_length, hop_length=hop_length, ref=ref, top_db=top_db
File "/opt/conda/lib/python3.7/site-packages/librosa/effects.py", line 448, in _signal_to_frame_nonsilent
mse = feature.rms(y=y_mono, frame_length=frame_length, hop_length=hop_length) ** 2
File "/opt/conda/lib/python3.7/site-packages/librosa/feature/spectral.py", line 925, in rms
y = np.pad(y, int(frame_length // 2), mode=pad_mode)
File "<__array_function__ internals>", line 6, in pad
File "/opt/conda/lib/python3.7/site-packages/numpy/lib/arraypad.py", line 816, in pad
"'constant' or 'empty'".format(axis)
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'
[[{{node PyFunc}}]]
[[IteratorGetNext]]
What do you think I should do? Or what could be the reason for this error? Thanks in advance.
I would recommend you to check your data first. It should not take too long. For this specific issue, you can just simply use a for loop to load all your data and call audio.trim_silence(wav, audio_hparams)
. Then you can know which file cause the problem, look at that wav file and address the issue.
Okay. I will do that. Thanks
Hi @caizexin ,
I have been trying to implement your work on my own dataset. I am trying to run the speaker embedding network in the deep_speaker folder using
python train.py
, but I keep running into this error:I have preprocessed the dataset as per the instructions mentioned in the repo. Here is an example: data_voxtest.zip
I can't really find a workaround for this. Any help is appreciated.
Thanks in advance.