Open elihuan1990 opened 3 years ago
可以用sentence-bert的方式微调
請問simbert.py訓練完模型並儲存best_model.weights了 我要如何加載best_model.weights模型並測試 `from bert4keras.tokenizers import Tokenizer from bert4keras.models import build_transformer_model from keras.models import Model import numpy as np
config_path = '/home/rca/research/simbert/root/kg/bert/chinese_simbert_L-12_H-768_A-12/bert_config.json' checkpoint_path = './latest_model.ckpt' dict_path = '/home/rca/research/simbert/root/kg/bert/chinese_simbert_L-12_H-768_A-12/vocab.txt'
tokenizer = Tokenizer(dict_path, do_lower_case=True)
bert = build_transformer_model( config_path, checkpoint_path, with_pool='linear', application='unilm', return_keras_model=False, ) model = Model(inputs=bert.model.inputs, outputs=bert.model.outputs) model.load_weights(checkpoint_path, by_name=True) # 加载权重时需要加上 by_name=True
test_sentence = "微信和支付宝哪个好?"
def gen_similar_sentences(text, n=10, k=10): similar_sentences = gen_synonyms(text, n, k) # 需要定义 gen_synonyms 函数 return similar_sentences
token_ids, segment_ids = tokenizer.encode(test_sentence, max_length=maxlen)
output_ids = model.predict([np.array([token_ids]), np.array([segment_ids])]) output_ids = output_ids[0].argmax(axis=1)
generated_sentence = tokenizer.decode(output_ids)
print(f"原句子:{test_sentence}") print(f"生成句子:{generated_sentence}") print("相似句子:") similar_sentences = gen_similar_sentences(test_sentence) for idx, sentence in enumerate(similar_sentences): print(f"{idx + 1}. {sentence}")` 是這樣寫嗎
我的方法是直接 from simbert import gen_synonyms,这样模型会加载新的权重
在lcqmc数据集上微调simbert,在测试集上spearman指标下降一个点,怎么微调simbert呢?