NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 371 forks source link

Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified #431

Closed vinnitu closed 5 years ago

vinnitu commented 5 years ago

before training I modified example_configs/speech2text/ds2_small_1gpu.py

...
train_params = {
  "data_layer": Speech2TextDataLayer,
  "data_layer_params": {
    "backend": "librosa", <---------------------------------------------
    "num_audio_features": 96,
...

after 40000 learning iteration I stopped training

befor inference I added to example_configs/speech2text/ds2_small_1gpu.py

infer_params = {
    "data_layer": Speech2TextDataLayer,
    "data_layer_params": {
        "backend": "librosa",
        "num_audio_features": 64,
        "input_type": "logfbank",
        "vocab_file": "open_seq2seq/test_utils/toy_speech_data/vocab.txt",
        "dataset_files": [
            "data/librispeech/librivox-test-clean.csv",
        ],
        "shuffle": False,
    },
}

but got error

$ pipenv run python run.py --config_file=example_configs/speech2text/ds2_small_1gpu.py --mode=infer --infer_output_file=ds2_out.txt

Loading .env environment variables...

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

*** Restoring from the latest checkpoint
*** Loading model from experiments/librispeech-quick/model.ckpt-40000
*** Inference config:
{'batch_size_per_gpu': 32,
 'data_layer': <class 'open_seq2seq.data.speech2text.speech2text.Speech2TextDataLayer'>,
 'data_layer_params': {'backend': 'librosa',
                       'dataset_files': ['data/librispeech/librivox-test-clean.csv'],
                       'input_type': 'logfbank',
                       'num_audio_features': 64,
                       'shuffle': False,
                       'vocab_file': 'open_seq2seq/test_utils/toy_speech_data/vocab.txt'},
 'decoder': <class 'open_seq2seq.decoders.fc_decoders.FullyConnectedCTCDecoder'>,
 'decoder_params': {'alpha': 2.0,
                    'alphabet_config_path': 'open_seq2seq/test_utils/toy_speech_data/vocab.txt',
                    'beam_width': 512,
                    'beta': 1.0,
                    'decoder_library_path': 'ctc_decoder_with_lm/libctc_decoder_with_kenlm.so',
                    'lm_path': 'language_model/4-gram.binary',
                    'trie_path': 'language_model/trie.binary',
                    'use_language_model': False},
 'dtype': tf.float32,
 'encoder': <class 'open_seq2seq.encoders.ds2_encoder.DeepSpeech2Encoder'>,
 'encoder_params': {'activation_fn': <function relu at 0x7f379e02c6a8>,
                    'conv_layers': [{'kernel_size': [11, 41],
                                     'num_channels': 32,
                                     'padding': 'SAME',
                                     'stride': [2, 2]},
                                    {'kernel_size': [11, 21],
                                     'num_channels': 32,
                                     'padding': 'SAME',
                                     'stride': [1, 2]}],
                    'data_format': 'channels_first',
                    'dropout_keep_prob': 0.5,
                    'n_hidden': 1024,
                    'num_rnn_layers': 2,
                    'rnn_cell_dim': 512,
                    'rnn_type': 'cudnn_gru',
                    'rnn_unidirectional': False,
                    'row_conv': False,
                    'use_cudnn_rnn': True},
 'eval_steps': 5000,
 'initializer': <function xavier_initializer at 0x7f377b2890d0>,
 'load_model': '',
 'logdir': 'experiments/librispeech-quick',
 'loss': <class 'open_seq2seq.losses.ctc_loss.CTCLoss'>,
 'loss_params': {},
 'lr_policy': <function exp_decay at 0x7f3774d8e7b8>,
 'lr_policy_params': {'begin_decay_at': 0,
                      'decay_rate': 0.9,
                      'decay_steps': 5000,
                      'learning_rate': 0.0001,
                      'min_lr': 0.0,
                      'use_staircase_decay': True},
 'num_epochs': 12,
 'num_gpus': 1,
 'optimizer': 'Adam',
 'optimizer_params': {},
 'print_loss_steps': 10,
 'print_samples_steps': 5000,
 'random_seed': 0,
 'regularizer': <function l2_regularizer at 0x7f377c35fc80>,
 'regularizer_params': {'scale': 0.0005},
 'save_checkpoint_steps': 1000,
 'save_summaries_steps': 100,
 'summaries': ['learning_rate',
               'variables',
               'gradients',
               'larc_summaries',
               'variable_norm',
               'gradient_norm',
               'global_gradient_norm'],
 'use_horovod': False,
 'use_xla_jit': False}
*** Building graph on GPU:0
WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/open_seq2seq/data/speech2text/speech2text.py:256: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/open_seq2seq/parts/cnns/conv_blocks.py:159: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/open_seq2seq/parts/cnns/conv_blocks.py:177: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.batch_normalization instead.
WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/open_seq2seq/encoders/ds2_encoder.py:387: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/open_seq2seq/encoders/ds2_encoder.py:389: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
*** Inference Mode. Loss part of graph isn't built.
2019-05-15 09:35:22.343169: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-15 09:35:22.750170: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x1a69d90 executing computations on platform CUDA. Devices:
2019-05-15 09:35:22.750219: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2019-05-15 09:35:22.770570: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3298335000 Hz
2019-05-15 09:35:22.771645: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x1976de0 executing computations on platform Host. Devices:
2019-05-15 09:35:22.771686: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-05-15 09:35:22.772724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.911
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.82GiB
2019-05-15 09:35:22.772764: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-05-15 09:35:22.774541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-15 09:35:22.774575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-05-15 09:35:22.774597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-05-15 09:35:22.775553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7604 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified.  Got: 1024
     [[{{node save/split_7}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1276, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified.  Got: 1024
     [[node save/split_7 (defined at /home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py:209) ]]

Caused by op 'save/split_7', defined at:
  File "run.py", line 103, in <module>
    main()
  File "run.py", line 93, in main
    infer(model, checkpoint, args.infer_output_file)
  File "/home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py", line 226, in infer
    results_per_batch = restore_and_get_results(model, checkpoint, mode="infer")
  File "/home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py", line 209, in restore_and_get_results
    saver = tf.train.Saver()
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 354, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 736, in restore
    restored_tensors)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 217, in tf_canonical_to_opaque
    cu_weights, cu_biases = self._tf_canonical_to_cu_canonical(tf_canonicals)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 382, in _tf_canonical_to_cu_canonical
    cu_weights.extend(self._tf_to_cudnn_weights(i, *bw_weights))
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 530, in _tf_to_cudnn_weights
    w_r, r_r = array_ops.split(W_r, [input_weight_width, num_units], axis=1)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1518, in split
    value=value, size_splits=size_splits, axis=axis, num_split=num, name=name)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8953, in split_v
    num_split=num_split, name=name)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified.  Got: 1024
     [[node save/split_7 (defined at /home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py:209) ]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 103, in <module>
    main()
  File "run.py", line 93, in main
    infer(model, checkpoint, args.infer_output_file)
  File "/home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py", line 226, in infer
    results_per_batch = restore_and_get_results(model, checkpoint, mode="infer")
  File "/home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py", line 218, in restore_and_get_results
    saver.restore(sess, checkpoint)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1312, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified.  Got: 1024
     [[node save/split_7 (defined at /home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py:209) ]]

Caused by op 'save/split_7', defined at:
  File "run.py", line 103, in <module>
    main()
  File "run.py", line 93, in main
    infer(model, checkpoint, args.infer_output_file)
  File "/home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py", line 226, in infer
    results_per_batch = restore_and_get_results(model, checkpoint, mode="infer")
  File "/home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py", line 209, in restore_and_get_results
    saver = tf.train.Saver()
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 354, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 736, in restore
    restored_tensors)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 217, in tf_canonical_to_opaque
    cu_weights, cu_biases = self._tf_canonical_to_cu_canonical(tf_canonicals)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 382, in _tf_canonical_to_cu_canonical
    cu_weights.extend(self._tf_to_cudnn_weights(i, *bw_weights))
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 530, in _tf_to_cudnn_weights
    w_r, r_r = array_ops.split(W_r, [input_weight_width, num_units], axis=1)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1518, in split
    value=value, size_splits=size_splits, axis=axis, num_split=num, name=name)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8953, in split_v
    num_split=num_split, name=name)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/ai/projects/OpenSeq2Seq/.venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Determined shape must either match input shape along split_dim exactly if fully specified, or be less than the size of the input along split_dim if not fully specified.  Got: 1024
     [[node save/split_7 (defined at /home/ai/projects/OpenSeq2Seq/open_seq2seq/utils/funcs.py:209) ]]

I have libcudnn.so.7.5.0 and cuda 10.0 tensorflow 1.13.1 GeForce GTX 1080 Ubuntu 18.04.2 LTS

vsl9 commented 5 years ago

Please make sure that "data_layer_params" for audio feature extraction are the same for training and inference. There is a mismatch in num_audio_features: 96 vs 64. And I suppose the same applies to input_type: spectrogram (default for DS2) vs logfbank.