TensorSpeech / TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
https://huylenguyen.com/asr
Apache License 2.0
938 stars 245 forks source link

Issues with example configs for Streaming Transducer #176

Closed AgaDob closed 3 years ago

AgaDob commented 3 years ago

Hi! I have cloned the latest version of the repo and I am running into 2 issues when using the example configs for the streaming transducer model: 1) the test_streaming_transducer.py script only runs with a batch-size of 1 - when using any other batch size, e.g. 3, I get: ValueError: Dimension 0 in both shapes must be equal, but are 3 and 1. Shapes are [3,1024] and [1,1024]. 2) When train with a batch size of 1 on LibriSpeech, and run test_streaming_transducer.py, the model outputs only blank strings for all audios...

Any ideas? I include the full trackback for issue 1) below:

Traceback (most recent call last):
  File "examples/streaming_transducer/test_streaming_transducer.py", line 88, in <module>
    streaming_transducer_tester.run(test_dataset)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/runners/base_runners.py", line 403, in run
    self._test_epoch()
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/runners/base_runners.py", line 414, in _test_epoch
    decoded = self._test_function(test_iter)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 725, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3196, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3887, in bound_method_wrapper
    return wrapped_fn(*args, **kwargs)
  File "/home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/runners/base_runners.py:431 _test_function  *
        return self._test_step(batch)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/runners/base_runners.py:449 _test_step  *
        greed_pred = self.model.recognize(features, input_length)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/models/streaming_transducer.py:263 recognize  *
        encoded, _ = self.encoder.recognize(features, self.encoder.get_initial_state())
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/models/streaming_transducer.py:168 recognize  *
        outputs, block_states = block.recognize(outputs, states=tf.unstack(states[i], axis=0))
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow_asr/models/streaming_transducer.py:79 recognize  *
        outputs = self.rnn(outputs, training=False, initial_state=states)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent.py:717 __call__  **
        return super(RNN, self).__call__(inputs, **kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py:1012 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent_v2.py:1270 call
        runtime) = lstm_with_backend_selection(**normal_lstm_kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent_v2.py:1655 lstm_with_backend_selection
        last_output, outputs, new_h, new_c, runtime = defun_standard_lstm(**params)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py:2941 __call__
        filtered_flat_args) = self._maybe_define_function(args, kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py:3361 _maybe_define_function
        graph_function = self._create_graph_function(args, kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/eager/function.py:3196 _create_graph_function
        func_graph_module.func_graph_from_py_func(
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py:990 func_graph_from_py_func
        func_outputs = python_func(*func_args, **func_kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/layers/recurrent_v2.py:1392 standard_lstm
        last_output, outputs, new_states = K.rnn(
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/backend.py:4493 rnn
        final_outputs = control_flow_ops.while_loop(
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py:2687 while_loop
        return while_v2.while_loop(
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/ops/while_v2.py:192 while_loop
        body_graph = func_graph_module.func_graph_from_py_func(
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py:990 func_graph_from_py_func
        func_outputs = python_func(*func_args, **func_kwargs)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/ops/while_v2.py:178 wrapped_body
        outputs = body(*_pack_sequence_as(orig_loop_vars, args))
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/keras/backend.py:4485 _step
        new_state.set_shape(state.shape)
    /home/usr/miniconda3/envs/tfasr/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:762 set_shape
        raise ValueError(str(e))

ValueError: Dimension 0 in both shapes must be equal, but are 3 and 1. Shapes are [3,1024] and [1,1024].
nglehuy commented 3 years ago

@AgaDob I've updated the fix, try pulling new commits on main and check if it works 😄

AgaDob commented 3 years ago

Brilliant, thank you - varied batch-size now works for inference! However, my model still only outputs blank strings at inference - any ideas?

nglehuy commented 3 years ago

@AgaDob Did you test for greedy decoding only? Can you show me the image of training losses? Underfit model will result in the wrong outputs.

AgaDob commented 3 years ago

@usimarit I've trained a model on the train-clean-100 with word-pieces and I think you are right, it is underfit quite a bit - have you experienced this issue when training with word-pieces? I've also trained as a PoC on the test set, and with a beam search of 2 it outputs the character 't' for every single utterance - I'm just wondering whether it could be by any chance an issue with the code, as opposed to the models? image

AgaDob commented 3 years ago

In fact, my test output looks very much like #105 where their model is outputting the letter 'i' for every utterance. However, I am saving my results to a new test file each time so that's not the issue...

PATH    GROUNDTRUTH GREEDY  BEAMSEARCH  BEAMSEARCHLM
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0000.flac   the three modes of management       i   
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0001.flac   to suppose that the object of this work is to aid in effecting such a substitution as that is entirely to mistake its nature and design     i   
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0002.flac   by reason and affection     i   
/home/usr/datasets/LibriSpeech/test-clean/7021/79730/7021-79730-0003.flac   as the chaise drives away mary stands bewildered and perplexed on the door step her mind in a tumult of excitement in which hatred of the doctor distrust and suspicion of her mother disappointment vexation and ill humor surge and swell among those delicate organizations on which the structure and development of the soul so closely depend doing perhaps an irreparable injury     i   
nglehuy commented 3 years ago

@AgaDob I haven't had a chance to properly train and test the streaming transducer model, but anyway you can customize it your own ways to overcome the underfit, like for example, changing the optimizer, applying the learning rate schedule, etc.

AgaDob commented 3 years ago

You are right, I think its an issue with the model not converging. I've tried with a tiny architecture and testing scripts work perfectly. Thank you for the fantastic resource! :raised_hands:

On a different note: why is the validation loss lower than the training loss for these models?

nglehuy commented 3 years ago

@AgaDob maybe you apply specaugment as in the example config that causes the val loss lower than train loss.