随机出现“generator”取数据报错及处理

你好：

有一个小小的疑问，

在CPU上训练，linux 16.04，使用aishell_1中的几个人的数据（2100条音频，验证代码用）；训练 ConformerTransducer，其它参数默认。

020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed
[Train] [Epoch 1/2] |                    | 7/2096 [00:36<2:07:57,  3.68s/batch, transducer_loss=373.089]
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...
[Train] [Epoch 1/2] |████▊               | 500/2096 [09:30<26:07,  1.02batch/s, Successfully Saved Checkpoint]
...
[Train] [Epoch 1/2] |█████▏              | 547/2096 [10:39<23:14,  1.11batch/s, transducer_loss=85.972]
...
ValueError: `generator` yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.

第547步出现报错，但是报错并不是只出现在某个固定的步数，是随机出现的。

经过对内部数据出里过程的了解，我发现你在数据处理脚本中做了如下的过滤处理：

if len(data) < 400:
    continue
elif len(data) > self.speech_featurizer.sample_rate * 7:
    continue

也就是说当音频数据（16K采样）时长小于25ms以及大于7s的时候，丢弃。当一个batch的所有音频数据时长都大于7s时，全丢弃，generator就生成None，也就造成上述的错误。

解决方法也很简单，把数字7改大一点就行。

那么问题来了，小于25ms的数据丢弃我可以理解，那大于7s 的也丢弃是为什么呢，超过7s会造成模型识别效果变差所以不用的吗？

你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗？

此外，tensorflow - WARNING部分是什么情况，没看明白？

ok，

后续逐渐完善generator的配置，当前去掉7秒以后的长度是为了防止GPU OOM的情况。
tensorflow的warning，是@tf.function的一些警告，没有影响，后续会自动追踪上的。后续会逐步调试让这些提示消失。

------------------ 原始邮件 ------------------ 发件人: "Z-yq/TensorflowASR" <notifications@github.com>; 发送时间: 2020年10月5日(星期一) 下午2:55 收件人: "Z-yq/TensorflowASR"<TensorflowASR@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: [Z-yq/TensorflowASR] 随机出现“generator”取数据报错及处理 (#5)

你好：

有一个小小的疑问，

在CPU上训练，linux 16.04，使用aishell_1中的几个人的数据（2100条音频，验证代码用）；训练 ConformerTransducer，其它参数默认。 020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed [Train] [Epoch 1/2] | | 7/2096 [00:36<2:07:57, 3.68s/batch, transducer_loss=373.089] WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. 2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. 2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. ... [Train] [Epoch 1/2] |████▊ | 500/2096 [09:30<26:07, 1.02batch/s, Successfully Saved Checkpoint] ... [Train] [Epoch 1/2] |█████▏ | 547/2096 [10:39<23:14, 1.11batch/s, transducer_loss=85.972] ... ValueError: generator yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.
第547步出现报错，但是报错并不是只出现在某个固定的步数，是随机出现的。

经过对内部数据出里过程的了解，我发现你在数据处理脚本中做了如下的过滤处理： if len(data) < 400: continue elif len(data) > self.speech_featurizer.sample_rate * 7: continue
也就是说当音频数据（16K采样）时长小于25ms以及大于7s的时候，丢弃。当一个batch的所有音频数据时长都大于7s时，全丢弃，generator就生成None，也就造成上述的错误。

解决方法也很简单，把数字7改大一点就行。

那么问题来了，小于25ms的数据丢弃我可以理解，那大于7s 的也丢弃是为什么呢，超过7s会造成模型识别效果变差所以不用的吗？

你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗？

此外，tensorflow - WARNING部分是什么情况，没看明白？

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Z-yq / TensorflowASR

随机出现“generator”取数据报错及处理 #5