Open geolvr opened 3 years ago
这个文本分类的例子运行正常,但是把task_sequence_labeling_ner_crf.py改造成单机多卡版的时候出错了,多了一个CRF层,尝试了很多都不行。 `
class data_generator(DataGenerator):
"""数据生成器
(每次只需要返回一条样本)
"""
def __iter__(self, random=False):
for is_end, item in self.sample(random):
......
yield [token_ids, segment_ids], [labels]
strategy = tf.distribute.MirroredStrategy() # 建立单机多卡策略
with strategy.scope(): # 调用该策略
bert = build_transformer_model(
config_path,
checkpoint_path=None, # 此时可以不加载预训练权重
return_keras_model=False, # 返回bert4keras类,而不是keras模型
)
model = bert.model # 这个才是keras模型
output_layer = 'Transformer-%s-FeedForward-Norm' % (bert_layers - 1)
output = model.get_layer(output_layer).output
output = Dense(num_labels, name='out')(output)
CRF = ConditionalRandomField(lr_multiplier=crf_lr_multiplier, name='crf')
output = CRF(output)
model = Model(model.input, output)
model.compile(loss=CRF.sparse_loss,
optimizer=Adam(learing_rate),
metrics=[CRF.sparse_accuracy])
model.summary()
bert.load_weights_from_checkpoint(checkpoint_path) # 必须最后才加载预训练权重
pos_num, neg_num, train_data = load_data('data/brand_sample_yiliao.val')
train_generator = data_generator(train_data, batch_size)
sample_len = math.ceil((pos_num + neg_sample_rate * neg_num) / batch_size)
dataset = train_generator.to_dataset(
types=[('float32', 'float32'), ('float32',)],
shapes=[([None], [None]), ([None],)], # 配合后面的padded_batch=True,实现自动padding
names=[('Input-Token', 'Input-Segment'), ('out',)],
padded_batch=True
) # 数据要转为tf.data.Dataset格式,names跟输入层的名字对应
model.fit(dataset,
steps_per_epoch=int(sample_len / split_num),
verbose = 2,
epochs=epochs)
`
报错信息: tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found. (0) Failed precondition: Error while reading resource variable out/bias/replica_2 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/out/bias/replica_2/N10tensorflow3VarE does not exist. [[{{node replica_2/out/BiasAdd/ReadVariableOp}}]] (1) Failed precondition: Error while reading resource variable out/bias/replica_2 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/out/bias/replica_2/N10tensorflow3VarE does not exist. [[{{node replica_2/out/BiasAdd/ReadVariableOp}}]] [[GroupCrossDeviceControlEdges_0/Adam/Adam/update_2/Const/_12024]]
麻烦苏神帮忙看看
提问时请尽可能提供如下信息:
基本信息
核心代码
观察到在task_seq2seq_autotitle.py中,Datagenerator yield的是[batch_token_ids, batch_segment_ids], None。而其多GPU版本 task_seq2seq_autotitle_multigpu.py中,Datagenerator yield的是token_ids, segment_ids。 在task_sentiment_albert.py中,Datagenerator yield的是[batch_token_ids, batch_segment_ids], batch_labels,包含一个label项。想知道如果将其改造成多GPU版本,Datagenerator 应该yield什么? 同时,
这里又该如何修改呢?
输出信息
自我尝试
试了很多种都会报错,包括: