May I ask your hardware?

chikiuso commented 6 years ago

Hi, may I ask about your hardware specification? Such as which graphics card you use? How many ram your pc have? thanks.

andabi commented 6 years ago

@chikiuso I used a server which has 8 Nvidia Tesla P40 GPU and 200GB memory. But single GPU core was enough.

HudsonHuang commented 6 years ago

@andabi

I ran your code on a server like this:

8 Nvidia GTX1080
about 40G graphics memory
200GB of memory

but the progress will always stop on " Creating TensorFlow device" and don't show any more infomation.

andabi commented 6 years ago

@HudsonHuang Please check the paths(data_path or something) in hparam.py again. If you set the paths incorrectly, the problem you mentioned could happen.

tbfly commented 6 years ago

I use: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845 pciBusID: 0000:01:00.0 totalMemory: 976.12MiB freeMemory: 948.88MiB

Meet the similar issue, the progress will always stop on " Creating TensorFlow device" and don't show any more information, when I installed ffmpeg. And I use "ps aux" found multi-process ffmpeg as Z status.

If I uninstall ffmpeg it report: OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](batch/padding_fifo_queue, batch/n)]]

andabi commented 6 years ago

@tbfly I think the queue runner is not working properly. To debug, set queue=False in train1.py and then run and see what message is up.

HudsonHuang commented 6 years ago

@andabi I change the path to the currect one, but it raises：

Traceback (most recent call last): File "train1.py", line 88, in train(logdir=logdir) File "train1.py", line 58, in train mfcc, ppg = get_batch(model.mode, model.batch_size) File "/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/data_load.py", line 256, in get_batch target_wavs = sample(wav_files, batch_size) File "/home/lab-huang.zhongyi/anaconda3/envs/tensorflow/lib/python3.5/random.py", line 324, in sample raise ValueError("Sample larger than population") ValueError: Sample larger than population

I checked batch_size was not changed, it was 32, and

andabi commented 6 years ago

@HudsonHuang It occurs when the number of wavfiles is smaller than the batch size.

HudsonHuang commented 6 years ago

@andabi and I check the len(wav_files) it's 0, seems it didn't get the files, I print the path in data_load.py of this line:data_load.py and, it was /home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/datasets/timit/TIMIT/TRAIN/*/*/*.wav and my path is : Is it a currect path?

Thank you so much.

tbfly commented 6 years ago

@andabi

2017-11-24 10:20:57.861347: I tensorflow/core/common_runtime/bfc_allocator.cc:683] Sum Total of in-use chunks: 685.56MiB 2017-11-24 10:20:57.861369: I tensorflow/core/common_runtime/bfc_allocator.cc:685] Stats: Limit: 759037952 InUse: 718866432 MaxInUse: 725436928 NumAllocs: 1411 MaxAllocSize: 156712960

2017-11-24 10:20:57.861511: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **** 2017-11-24 10:20:57.861549: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[32,401,2048]

It looks "GeForce GTX 750 Ti " ResourceExhaustedError when use GPU mode. 😢

tbfly commented 6 years ago

Traceback (most recent call last):
File "train1.py", line 102, in train(logdir=logdir) File "train1.py", line 76, in train summ, gs = sess.run([summ_op, global_step]) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [32,?,40] [[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[32,?,40], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'Placeholder', defined at: File "train1.py", line 102, in train(logdir=logdir) File "train1.py", line 18, in train model = Model(mode="train1", batch_size=hp.Train1.batch_size, queue=queue) File "/home/lite/deep-voice-conversion/models.py", line 22, in init self.x_mfcc, self.y_ppgs, self.y_spec, self.y_mel, self.num_batch = self.get_input(mode, batch_size, queue) File "/home/lite/deep-voice-conversion/models.py", line 43, in get_input x_mfcc = tf.placeholder(tf.float32, shape=(batch_size, None, hp_default.n_mfcc)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1599, in placeholder return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3091, in _placeholder "Placeholder", dtype=dtype, shape=shape, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype float and shape [32,?,40] [[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[32,?,40], _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

If I set CUDA_VISIBLE_DEVICES="" force use CPU mode and set "queue = False", the error is above.

tbfly commented 6 years ago

It seems CPU mode and set "queue = False": mfcc, ppg = get_batch(model.mode, model.batch_size) sess.run(train_op, feed_dict={model.x_mfcc: mfcc, model.y_ppgs: ppg})

get_batch don't get the correct tensors?

HudsonHuang commented 6 years ago

@tbfly I think your problem is similar with mine, I did‘t get currect data to feed, may I ask you what is it your path to TIMIT dataset? Was it seems like this?

SriramS32 commented 6 years ago

@HudsonHuang your problem may be that the '.wav' at the end of the path is case sensitive and your files have .WAV instead. data_path = '{}/timit/TIMIT/TRAIN/\*/\*/*.WAV'.format(data_path_base) I have the same problem as @tbfly and am trying to figure it out.

tbfly commented 6 years ago

@SriramS32 I think it is a os dependence or version conflict issue. I run the same code in OSX work perfect. While I run it in Linux64, it causes the issue.

tbfly commented 6 years ago

@SriramS32 @andabi It seems cause by " tf.summary.merge_all".

SriramS32 commented 6 years ago

@tbfly Yes, you are right. I can train on OSX with queue=True without any problems. Unfortunately, I don't have GPU support on non Linux machines. Interesting, is it expecting us to fill the placeholder even when we run the summ_op?

tbfly commented 6 years ago

-            summ, gs = sess.run([summ_op, global_step])
+            if queue:
+                summ, gs = sess.run([summ_op, global_step])
+            else:
+                summ, gs = sess.run([summ_op, global_step], feed_dict={model.x_mfcc: mfcc, model.y_ppgs: ppg})

@SriramS32 @andabi It seems this modify fix the error. In queue=False mode. While queue=True mode on Linux64, I need more time to figure out why not work.😄

wyn314 commented 6 years ago

@tbfly Have you found out why it doesn't work in queue=True mode?

tiankong-hut commented 6 years ago

@tbfly I meet the same problem, did you resolve it? Thank you!

Traceback (most recent call last): File "/home/human-machine/Speech/deep-voice-conversion-master/train1.py", line 90, in train(logdir='./logdir/default/train1', queue=True) File "/home/human-machine/Speech/deep-voice-conversion-master/train1.py", line 57, in train sess.run(train_op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

tiankong-hut commented 6 years ago

I have resolved my problem , just install ffmpeg.

andabi / deep-voice-conversion

May I ask your hardware? #2