bojone / bert4keras

keras implement of transformers for humans
https://kexue.fm/archives/6915
Apache License 2.0
5.37k stars 927 forks source link

RealFormer构建出错, 希望提供RealFormer使用测试样例 #347

Closed xsthunder closed 3 years ago

xsthunder commented 3 years ago

可能是构建模型时,build_transformer_model参数设置有问题,example中没有找到residual_attention_scores的参数使用样例,希望提供

基本信息

核心代码

# 请在此处贴上你的核心代码。
# 请尽量只保留关键部分,不要无脑贴全部代码。
from pathlib import Path
bert_weight_path = Path("~/bert/chinese_roberta_wwm_ext_L-12_H-768_A-12")
config_path = str(bert_weight_path/'bert_config.json')
checkpoint_path = str(bert_weight_path/'bert_model.ckpt')
dict_path = str(bert_weight_path/'vocab.txt')
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '2'
os.environ["RECOMPUTE"] = '1'
os.environ["KERAS_BACKEND"] = 'tensorflow'
os.environ['TF_KERAS'] = '1'  # 必须使用tf.keras'
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
from bert4keras.models import build_transformer_model
model = build_transformer_model(
    config_path,
    checkpoint_path,
    application='encoder',
    return_keras_model=False, 
    residual_attention_scores=True
)

输出信息

/usr/local/lib/python3.8/dist-packages/bert4keras/models.py in apply(self, inputs, layer, arguments, **kwargs)
    157                         inputs = inputs[:3] + [a_bias] + inputs[4:]
    158                         arguments['a_bias'] = True
--> 159                     o, a = self.layers[name](inputs, **arguments)
    160                     self.attention_scores = a
    161                     return o

ValueError: Tried to convert 'input' to a tensor and failed. Error: Shapes must be equal rank, but are 3 and 4
    From merging shape 0 with other shapes. for '{{node Transformer-0-MultiHeadSelfAttention/Identity/packed}} = Pack[N=2, T=DT_FLOAT, axis=0](Transformer-0-MultiHeadSelfAttention/mul_1, Transformer-0-MultiHeadSelfAttention/sub_1)' with input shapes: [?,?,768], [?,12,?,?].

自我尝试

不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。

尝试1 测试环境

去掉residual_attention_scores=True,成功构建。

尝试2 从图中找算子,动态构建attention weight矩阵旁路

失败,tf1的算子结果tensor,不支持用于keras.Model构建模型。tf1的算子结果tensor只能在session下运行后拿到结果。

att_layer = model.apply(name="Transformer-1-MultiHeadSelfAttention")
softmax_op = att_layer.output.graph.get_operation_by_name("Transformer-0-MultiHeadSelfAttention/Softmax")
tf.keras.Model(su_model.inputs, softmax_op.outputs)

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: You must feed a value for placeholder tensor 'Input-Token' with dtype float and shape [?,?]
     [[{{node Input-Token}}]]
     [[Transformer-0-MultiHeadSelfAttention/mul/_1183]]
  (1) Invalid argument: You must feed a value for placeholder tensor 'Input-Token' with dtype float and shape [?,?]
     [[{{node Input-Token}}]]
0 successful operations.
0 derived errors ignored.

更换tf1.15

Python 3.6.9 ubuntu1~18.04 tf.keras.version: '2.2.4-tf' tf: '1.15.4' bert4keras: 0.9.9 chinese_roberta_wwm_ext_L-12_H-768_A-12

/usr/local/lib/python3.6/dist-packages/bert4keras/models.py in apply(self, inputs, layer, arguments, **kwargs)
    157                         inputs = inputs[:3] + [a_bias] + inputs[4:]
    158                         arguments['a_bias'] = True
--> 159                     o, a = self.layers[name](inputs, **arguments)
    160                     self.attention_scores = a
    161                     return o

输出信息变更

ValueError: Tried to convert 'input' to a tensor and failed. Error: Shapes must be equal rank, but are 3 and 4
    From merging shape 0 with other shapes. for 'Transformer-0-MultiHeadSelfAttention_1/Identity/packed' (op: 'Pack') with input shapes: [?,?,768], [?,12,?,?].

同样地,去掉residual_attention_scores=True,成功构建。

bojone commented 3 years ago

你说的是realformer吧

我在tf 1.15下,测试keras/tf.keras都能成功跑起上述模型。这说明模型的代码实现是没有问题的。

你有没有试过去掉tf.compat.v1.disable_eager_execution()看能不能成功?

我对tf 2.x比较抗拒,目前不打算针对tf 2.x开发,只能说尽量同时支持一下~

xsthunder commented 3 years ago

去掉tf.compat.v1.disable_eager_execution()无效

先不管,谢谢苏神的快速回答