Z-yq / TensorflowASR

一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1
Apache License 2.0
461 stars 111 forks source link

随机出现“generator”取数据报错及处理 #5

Closed phecda-xu closed 4 years ago

phecda-xu commented 4 years ago

你好:

有一个小小的疑问,

在CPU上训练,linux 16.04,使用aishell_1中的几个人的数据(2100条音频,验证代码用);训练 ConformerTransducer, 其它参数默认。

020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed
[Train] [Epoch 1/2] |                    | 7/2096 [00:36<2:07:57,  3.68s/batch, transducer_loss=373.089]
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
...
[Train] [Epoch 1/2] |████▊               | 500/2096 [09:30<26:07,  1.02batch/s, Successfully Saved Checkpoint]
...
[Train] [Epoch 1/2] |█████▏              | 547/2096 [10:39<23:14,  1.11batch/s, transducer_loss=85.972]
...
ValueError: `generator` yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.

第547步出现报错,但是报错并不是只出现在某个固定的步数,是随机出现的。

经过对内部数据出里过程的了解,我发现你在数据处理脚本中做了如下的过滤处理:

if len(data) < 400:
    continue
elif len(data) > self.speech_featurizer.sample_rate * 7:
    continue

也就是说当音频数据(16K采样)时长小于25ms以及大于7s的时候,丢弃。当一个batch的所有音频数据时长都大于7s时,全丢弃,generator就生成None,也就造成上述的错误。

解决方法也很简单,把数字7改大一点就行。

那么问题来了,小于25ms的数据丢弃我可以理解,那大于7s 的也丢弃是为什么呢,超过7s会造成模型识别效果变差所以不用的吗?

你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗?

此外,tensorflow - WARNING部分是什么情况,没看明白?

Z-yq commented 4 years ago

ok,

  1. 后续逐渐完善generator的配置,当前去掉7秒以后的长度是为了防止GPU OOM的情况。
  2. tensorflow的warning,是@tf.function的一些警告,没有影响,后续会自动追踪上的。后续会逐步调试让这些提示消失。

------------------ 原始邮件 ------------------ 发件人: "Z-yq/TensorflowASR" <notifications@github.com>; 发送时间: 2020年10月5日(星期一) 下午2:55 收件人: "Z-yq/TensorflowASR"<TensorflowASR@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: [Z-yq/TensorflowASR] 随机出现“generator”取数据报错及处理 (#5)

你好:

有一个小小的疑问,

在CPU上训练,linux 16.04,使用aishell_1中的几个人的数据(2100条音频,验证代码用);训练 ConformerTransducer, 其它参数默认。 020-10-05 10:28:11,241 - root - INFO - trainer resume failed020-10-05 10:28:11,241 - root - INFO - trainer resume failed [Train] [Epoch 1/2] | | 7/2096 [00:36<2:07:57, 3.68s/batch, transducer_loss=373.089] WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. 2020-10-05 10:28:47,972 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f911405de60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. 2020-10-05 10:28:48,185 - tensorflow - WARNING - 5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7b83b0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. WARNING:tensorflow:5 out of the last 6 calls to <function MultiHeadAttention.call at 0x7f910c7648c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details. ... [Train] [Epoch 1/2] |████▊ | 500/2096 [09:30<26:07, 1.02batch/s, Successfully Saved Checkpoint] ... [Train] [Epoch 1/2] |█████▏ | 547/2096 [10:39<23:14, 1.11batch/s, transducer_loss=85.972] ... ValueError: generator yielded an element of shape (0,) where an element of shape (None, None, 80, 1) was expected.
第547步出现报错,但是报错并不是只出现在某个固定的步数,是随机出现的。

经过对内部数据出里过程的了解,我发现你在数据处理脚本中做了如下的过滤处理: if len(data) < 400: continue elif len(data) > self.speech_featurizer.sample_rate * 7: continue
也就是说当音频数据(16K采样)时长小于25ms以及大于7s的时候,丢弃。当一个batch的所有音频数据时长都大于7s时,全丢弃,generator就生成None,也就造成上述的错误。

解决方法也很简单,把数字7改大一点就行。

那么问题来了,小于25ms的数据丢弃我可以理解,那大于7s 的也丢弃是为什么呢,超过7s会造成模型识别效果变差所以不用的吗?

你在处理AISHELL2数据集的时候是把所有大于7s的音频都丢弃不用吗?

此外,tensorflow - WARNING部分是什么情况,没看明白?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

phecda-xu commented 4 years ago

秒以后的长度是为了防止GPU OOM

明白,谢谢!