NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Apache License 2.0
1.55k stars 369 forks source link

demo_streaming_asr.py, AssertionError #532

Open HunbeomBak opened 4 years ago

HunbeomBak commented 4 years ago

Please understand that my English skill is not good.

I tested the microphone demo.

The model was trained with my dataset, and interactive_infer parameter was also added to the config file. interactive_infer_params = { "data_layer": Speech2TextDataLayer, "data_layer_params": { "num_audio_features": 64, "input_type": "logfbank", "vocab_file": "open_seq2seq/test_utils/toy_speech_data/vocab.txt", "dataset_files": [], "shuffle": False, }, }

Openseq2seq was installed by Docker, and learning and infer were executed without problems. i used jasper-Mini-for-Jetson.py as config file for training and infering.

Below is a copy of the executed result.

root@f31b402db666:/data/ASR/OpenSeq2Seq# python demo_streaming_asr.py ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_route.c:867:(find_matching_chmap) Found no matching channel map Available audio input devices: 0 HDA Intel PCH: ALC1220 Analog (hw:0,0) 2 HDA Intel PCH: ALC1220 Alt Analog (hw:0,2) 11 Microsoft® LifeCam HD-3000: USB Audio (hw:3,0) 12 sysdefault 22 default Please type input device ID: 11

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

Restoring from the latest checkpoint Inference config: {'batch_size_per_gpu': 1, 'data_layer': <class 'open_seq2seq.data.speech2text.speech2text.Speech2TextDataLayer'>, 'data_layer_params': {'backend': 'librosa', 'dataset_files': [], 'dither': 1e-05, 'input_type': 'logfbank', 'norm_per_feature': True, 'num_audio_features': 64, 'pad_to': 0, 'precompute_mel_basis': True, 'sample_freq': 16000, 'shuffle': False, 'vocab_file': 'open_seq2seq/test_utils/toy_speech_data/vocab.txt', 'window': 'hanning'}, 'decoder': <class 'open_seq2seq.decoders.fc_decoders.FullyConnectedCTCDecoder'>, 'decoder_params': {'infer_logits_to_pickle': True, 'initializer': <function xavier_initializer at 0x7f926707ea60>, 'use_language_model': False}, 'dtype': tf.float32, 'encoder': <class 'open_seq2seq.encoders.tdnn_encoder.TDNNEncoder'>, 'encoder_params': {'activation_fn': <function relu at 0x7f913e994730>, 'convnet_layers': [{'dilation': [1], 'kernel_size': [11], 'num_channels': 256, 'padding': 'SAME', 'repeat': 1, 'stride': [2], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [11], 'num_channels': 256, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [11], 'num_channels': 256, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [13], 'num_channels': 256, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [13], 'num_channels': 256, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [17], 'num_channels': 512, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [17], 'num_channels': 512, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [21], 'num_channels': 512, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [21], 'num_channels': 512, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [25], 'num_channels': 512, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [25], 'num_channels': 512, 'padding': 'SAME', 'repeat': 3, 'residual': True, 'residual_dense': False, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [2], 'kernel_size': [29], 'num_channels': 512, 'padding': 'SAME', 'repeat': 1, 'stride': [1], 'type': 'sep_conv1d'}, {'dilation': [1], 'kernel_size': [1], 'num_channels': 1024, 'padding': 'SAME', 'repeat': 1, 'stride': [1], 'type': 'sep_conv1d'}], 'data_format': 'channels_last', 'dropout_keep_prob': 1.0, 'initializer': <function xavier_initializer at 0x7f926707ea60>, 'initializer_params': {'uniform': False}, 'normalization': 'batch_norm', 'use_conv_mask': True}, 'eval_steps': 2200, 'iter_size': 1, 'larc_params': {'larc_eta': 0.001}, 'logdir': '/data2/model/20200330_LDC_ATCOSIM_mini', 'loss': <class 'open_seq2seq.losses.ctc_loss.CTCLoss'>, 'loss_params': {}, 'lr_policy': <function poly_decay at 0x7f9254fa3d90>, 'lr_policy_params': {'learning_rate': 0.02, 'min_lr': 1e-05, 'power': 2.0}, 'num_checkpoints': 2, 'num_epochs': 100, 'num_gpus': 1, 'optimizer': <class 'open_seq2seq.optimizers.novograd.NovoGrad'>, 'optimizer_params': {'beta1': 0.95, 'beta2': 0.98, 'epsilon': 1e-08, 'grad_averaging': False, 'weight_decay': 0.001}, 'print_loss_steps': 100, 'print_samples_steps': 2200, 'random_seed': 0, 'save_checkpoint_steps': 1100, 'save_summaries_steps': 100, 'summaries': ['learning_rate', 'variables', 'gradients', 'larc_summaries', 'variable_norm', 'gradient_norm', 'global_gradient_norm'], 'use_horovod': False, 'use_xla_jit': False} Building graph on GPU:0 WARNING:tensorflow:From /data/ASR/OpenSeq2Seq/open_seq2seq/parts/cnns/conv_blocks.py:192: separable_conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.separable_conv1d instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /data/ASR/OpenSeq2Seq/open_seq2seq/parts/cnns/conv_blocks.py:223: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.batch_normalization instead. WARNING:tensorflow:From /data/ASR/OpenSeq2Seq/open_seq2seq/encoders/tdnn_encoder.py:255: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob. WARNING:tensorflow:From /data/ASR/OpenSeq2Seq/open_seq2seq/decoders/fc_decoders.py:139: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. Inference Mode. Loss part of graph isn't built. 2020-04-01 05:06:12.246228: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3300000000 Hz 2020-04-01 05:06:12.248433: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x7ddb280 executing computations on platform Host. Devices: 2020-04-01 05:06:12.248487: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): , 2020-04-01 05:06:12.455382: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x7de1000 executing computations on platform CUDA. Devices: 2020-04-01 05:06:12.455442: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): TITAN Xp, Compute Capability 6.1 2020-04-01 05:06:12.455462: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (1): TITAN Xp, Compute Capability 6.1 2020-04-01 05:06:12.456070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:67:00.0 totalMemory: 11.91GiB freeMemory: 191.50MiB 2020-04-01 05:06:12.456185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:68:00.0 totalMemory: 11.91GiB freeMemory: 173.38MiB 2020-04-01 05:06:12.456579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0, 1 2020-04-01 05:06:13.345461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-01 05:06:13.345502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2020-04-01 05:06:13.345508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N Y 2020-04-01 05:06:13.345513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: Y N 2020-04-01 05:06:13.349140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 121 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:67:00.0, compute capability: 6.1) 2020-04-01 05:06:13.349769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 103 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:68:00.0, compute capability: 6.1) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/util/decorator_utils.py:145: GraphKeys.VARIABLES (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.GraphKeys.GLOBAL_VARIABLES instead. WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Initialization was successful Traceback (most recent call last): File "demo_streaming_asr.py", line 28, in callback pred = asr.transcribe(signal) File "/data/ASR/OpenSeq2Seq/frame_asr.py", line 237, in transcribe return self._decode(frame, self.offset, self.merge) File "/data/ASR/OpenSeq2Seq/frame_asr.py", line 195, in _decode assert len(frame)==self.n_frame_len AssertionError Traceback (most recent call last): File "demo_streaming_asr.py", line 45, in time.sleep(0.1) AssertionError root@f31b402db666:/data/ASR/OpenSeq2Seq#

I tried to find the cause of the problem.

On frame_asr.py, ` def transcribe(self, frame=None): print(np.shape(frame)) print(self.n_frame_len)

    if frame is None:
        frame = np.zeros(shape=self.n_frame_len, dtype=np.float32)
    if len(frame) < self.n_frame_len:
        frame = np.pad(frame, [0, self.n_frame_len - len(frame)], 'constant')
    return self._decode(frame, self.offset, self.merge)


result : (32000,) 3200

it look like shape miss-match.

how can i solve this problem???