Kyubyong / tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Apache License 2.0
1.83k stars 436 forks source link

sanity_check = False crashes training #91

Open aijanai opened 7 years ago

aijanai commented 7 years ago

When I turn off sanity_check, I obtain the following and the training crashes:

ubuntu@ip-172-31-13-191:~/tacotron$ python3 train.py 
Training Graph loaded
2017-11-04 08:15:28.431148: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled t
o use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-11-04 08:15:28.601591: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but ther
e must be at least one NUMA node, so returning NUMA node zero
2017-11-04 08:15:28.601950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1031] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8755
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2017-11-04 08:15:28.601991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80
, pci bus id: 0000:00:1e.0, compute capability: 3.7)
  0%|                                          | 0/384 [00:00<?, ?b/s]2017-11-04 08:15:40.637914: W tensorflow/core/framework/op_kernel.cc:1192] Out of range:
 PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 1)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
2017-11-04 08:15:40.639653: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insu
fficient elements (requested 32, current size 1)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]

[REPEATS FOR A WHILE]

         [[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]
Traceback (most recent call last):                                    
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32,
 current size 1)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 962, in managed_session
    yield sess
  File "train.py", line 121, in main
    sess.run(g.train_op)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32,
 current size 1)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]

Caused by op 'batch', defined at:
  File "train.py", line 128, in <module>
    main()
  File "train.py", line 108, in main
    g = Graph(); print("Training Graph loaded")
  File "train.py", line 35, in __init__
    self.x, self.y, self.z, self.num_batch = get_batch()
  File "/home/ubuntu/tacotron/data_load.py", line 166, in get_batch
    dynamic_pad=True)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py", line 911, in batch
    name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/input.py", line 706, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2991, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1479, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 
1)
         [[Node: batch = QueueDequeueManyV2[component_types=[DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU
:0"](batch/padding_fifo_queue, batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 128, in <module>
    main()
  File "train.py", line 125, in main
    sv.saver.save(sess, hp.logdir + '/model_epoch_%02d_gs_%d' % (epoch, gs))
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 972, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 800, in stop
    ignore_live_threads=ignore_live_threads)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
    raise value
  File "/home/ubuntu/tacotron/data_load.py", line 93, in _run
    self.func(sess, enqueue_op)  # call enqueue function
  File "/home/ubuntu/tacotron/data_load.py", line 40, in enqueue_func
    data = func(sess.run(inputs))
  File "/home/ubuntu/tacotron/data_load.py", line 147, in get_text_and_spectrograms
    _spectrogram, _magnitude = get_spectrograms(_sound_file)
  File "/home/ubuntu/tacotron/utils.py", line 28, in get_spectrograms
    y, sr = librosa.load(sound_file, sr=hp.sr) # or set sr to hp.sr.
  File "/usr/local/lib/python3.5/dist-packages/librosa/core/audio.py", line 107, in load
    with audioread.audio_open(os.path.realpath(path)) as input_file:
  File "/usr/local/lib/python3.5/dist-packages/audioread/__init__.py", line 80, in audio_open
    return rawread.RawAudioFile(path)
  File "/usr/local/lib/python3.5/dist-packages/audioread/rawread.py", line 64, in __init__
    self._file = aifc.open(self._fh)
  File "/usr/lib/python3.5/aifc.py", line 890, in open
    return Aifc_read(f)
  File "/usr/lib/python3.5/aifc.py", line 340, in __init__
    self.initfp(f)
  File "/usr/lib/python3.5/aifc.py", line 303, in initfp
    chunk = Chunk(file)
  File "/usr/lib/python3.5/chunk.py", line 63, in __init__
    raise EOFError
EOFError
eazhary commented 7 years ago

Check the audio files.. looks like one of them cannot be opened by librosa.. I had a similar problem when I was using mp3 files, and the queue handler couldn't handle calling an external command (ffmpeg). It looks like you are not using wav files?