能否提供一个BERT文本分类使用multigpu的例子？

geolvr commented 3 years ago

提问时请尽可能提供如下信息：

基本信息

你使用的操作系统: Linux
你使用的Python版本: 3.7.1
你使用的Tensorflow版本: 1.14.0
你使用的Keras版本: 2.3.1
你使用的bert4keras版本:
你使用纯keras还是tf.keras:
你加载的预训练模型:roformer, bert

核心代码

观察到在task_seq2seq_autotitle.py中，Datagenerator yield的是[batch_token_ids, batch_segment_ids], None。而其多GPU版本 task_seq2seq_autotitle_multigpu.py中，Datagenerator yield的是token_ids, segment_ids。在task_sentiment_albert.py中，Datagenerator yield的是[batch_token_ids, batch_segment_ids], batch_labels，包含一个label项。想知道如果将其改造成多GPU版本，Datagenerator 应该yield什么？同时，

dataset = train_generator.to_dataset(
        types=('float32', 'float32'),
        shapes=([None], [None]),  # 配合后面的padded_batch=True，实现自动padding
        names=('Input-Token', 'Input-Segment'),
        padded_batch=True
    )  # 数据要转为tf.data.Dataset格式，names跟输入层的名字对应

这里又该如何修改呢？

输出信息

 (0) Invalid argument: You must feed a value for placeholder tensor 'replica_1/dense_72_target' with dtype float and shape [?,?]
         [[{{node replica_1/dense_72_target}}]]
         [[loss_1/mul/_7387]]
  (1) Invalid argument: You must feed a value for placeholder tensor 'replica_1/dense_72_target' with dtype float and shape [?,?]
         [[{{node replica_1/dense_72_target}}]]

自我尝试

试了很多种都会报错，包括：

class data_generator(DataGenerator):
    """数据生成器
    （每次只需要返回一条样本）
    """
    def __iter__(self, random=False):
        for is_end, (text, label) in self.sample(random):
            token_ids, segment_ids = tokenizer.encode(text, maxlen=maxlen)
            token_ids = token_ids + [0] * (maxlen - len(token_ids))
            segment_ids = segment_ids + [0] * (maxlen - len(segment_ids))
            yield [token_ids, segment_ids], label

class data_generator(DataGenerator):
    """数据生成器
    （每次只需要返回一条样本）
    """
    def __iter__(self, random=False):
        for is_end, (text, label) in self.sample(random):
            token_ids, segment_ids = tokenizer.encode(text, maxlen=maxlen)
            yield [token_ids, segment_ids], label

class data_generator(DataGenerator):
    """数据生成器
    （每次只需要返回一条样本）
    """
    def __iter__(self, random=False):
        for is_end, (text, label) in self.sample(random):
            token_ids, segment_ids = tokenizer.encode(text, maxlen=maxlen)
            token_ids = token_ids + [0] * (maxlen - len(token_ids))
            segment_ids = segment_ids + [0] * (maxlen - len(segment_ids))
            yield token_ids, segment_ids, label

dataset = train_generator.to_dataset(
        types=('float32', 'float32','float32 '),
        shapes=([None], [None], [None]),  # 配合后面的padded_batch=True，实现自动padding
        names=('Input-Token', 'Input-Segment', 'dense_72_target'),
        padded_batch=True
    )  # 数据要转为tf.data.Dataset格式，names跟输入层的名字对应

bojone commented 3 years ago

感谢建议，已经添加

https://github.com/bojone/bert4keras/blob/master/examples/task_iflytek_multigpu.py

paddydai commented 2 years ago

这个文本分类的例子运行正常，但是把task_sequence_labeling_ner_crf.py改造成单机多卡版的时候出错了，多了一个CRF层，尝试了很多都不行。 `

class data_generator(DataGenerator):
    """数据生成器
       (每次只需要返回一条样本)
    """
    def __iter__(self, random=False):
        for is_end, item in self.sample(random):
            ......
            yield [token_ids, segment_ids], [labels]
strategy = tf.distribute.MirroredStrategy()  # 建立单机多卡策略
with strategy.scope():  # 调用该策略
    bert = build_transformer_model(
        config_path,
        checkpoint_path=None, # 此时可以不加载预训练权重
        return_keras_model=False,  # 返回bert4keras类，而不是keras模型
    )

    model = bert.model  # 这个才是keras模型
    output_layer = 'Transformer-%s-FeedForward-Norm' % (bert_layers - 1)
    output = model.get_layer(output_layer).output
    output = Dense(num_labels, name='out')(output)
    CRF = ConditionalRandomField(lr_multiplier=crf_lr_multiplier, name='crf')
    output = CRF(output)

    model = Model(model.input, output)
    model.compile(loss=CRF.sparse_loss,
              optimizer=Adam(learing_rate),
              metrics=[CRF.sparse_accuracy])
    model.summary()
    bert.load_weights_from_checkpoint(checkpoint_path)  # 必须最后才加载预训练权重

pos_num, neg_num, train_data = load_data('data/brand_sample_yiliao.val')
train_generator = data_generator(train_data, batch_size)
sample_len = math.ceil((pos_num + neg_sample_rate * neg_num) / batch_size)
dataset = train_generator.to_dataset(
    types=[('float32', 'float32'), ('float32',)],
    shapes=[([None], [None]), ([None],)], # 配合后面的padded_batch=True，实现自动padding
    names=[('Input-Token', 'Input-Segment'), ('out',)],
    padded_batch=True
) # 数据要转为tf.data.Dataset格式，names跟输入层的名字对应
model.fit(dataset,
        steps_per_epoch=int(sample_len / split_num),
        verbose = 2,
        epochs=epochs)

`

报错信息： tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found. (0) Failed precondition: Error while reading resource variable out/bias/replica_2 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/out/bias/replica_2/N10tensorflow3VarE does not exist. [[{{node replica_2/out/BiasAdd/ReadVariableOp}}]] (1) Failed precondition: Error while reading resource variable out/bias/replica_2 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/out/bias/replica_2/N10tensorflow3VarE does not exist. [[{{node replica_2/out/BiasAdd/ReadVariableOp}}]] [[GroupCrossDeviceControlEdges_0/Adam/Adam/update_2/Const/_12024]]

麻烦苏神帮忙看看

bojone / bert4keras