Closed changchend closed 3 years ago
训练中止过,已经保存了 第一个best_model。
pairwise_matching_task.eval(load_best_model=True) 用eval 接口和train 验证结果一致
[2020-12-30 17:40:13,630] [ INFO] - PaddleHub predict start [2020-12-30 17:40:13,630] [ INFO] - The best model has been loaded [2020-12-30 17:44:27,030] [ INFO] - PaddleHub predict finished.
你好,你是指模型在predict的时候的效果变差吗? 麻烦提供一下paddlepaddle和paddlehub的具体版本
你好,你是指模型在predict的时候的效果变差吗? 麻烦提供一下paddlepaddle和paddlehub的具体版本 训练的时候 [dev dataset evaluation result] loss=0.39535 f1=0.93372 [step/sec: 4.69] [2020-12-31 08:54:11,475] [ EVAL] - best model saved to /data/qy/ckpt_ernie_pointwise_matching/best_model [best f1=0.93372] 但是单独我用predict接口预测 dev 数据集时候,结果变差了 paddlepaddle 版本 1.8.0 panddlehub 1.8.1
你好,你是指模型在predict的时候的效果变差吗? 麻烦提供一下paddlepaddle和paddlehub的具体版本 训练的时候 [dev dataset evaluation result] loss=0.39535 f1=0.93372 [step/sec: 4.69] [2020-12-31 08:54:11,475] [ EVAL] - best model saved to /data/qy/ckpt_ernie_pointwise_matching/best_model [best f1=0.93372] 但是单独我用predict接口预测 dev 数据集时候,结果变差了 paddlepaddle 版本 1.8.0 panddlehub 1.8.1
数据集是你自己整理的吗? 你可以先尝试使用paddlehub 1.8中自带的数据集训练,其他维持不变,看看是否问题依旧存在。 https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.8/demo/pairwise_text_matching
evel 和 predict 的 run_states 不一样是否有影响 list 0 是相同的 1 是不用的
https://aistudio.baidu.com/bdvgpu/user/63646/1413353/notebooks/1413353.ipynb 这是官方的例子, predict后 会降低精度,, 可以后台看看 是不是我什么地方写错了
https://aistudio.baidu.com/bdvgpu/user/63646/1413353/notebooks/1413353.ipynb 这是官方的例子, predict后 会降低精度,, 可以后台看看 是不是我什么地方写错了
这个链接看不了,方便的话提供我原有aistudio的教程,我看一下问题是否能重现
输入text_a 和 text_b 或 text_a,text_b,text_c 结果一致
输入text_a 和 text_b 或 text_a,text_b,text_c 结果一致
想问下目前这边是什么样的业务需求用到pairwise的模型呢?想了解一下使用背景和数据的情况。
然后关于issue的问题,我这边复现你的情况了,首先要说明的是,predict接口只会将列表的前两个元素作为text_a
和text_b
,所以你的文本里有没有text_c
都是一样的结果;而且pairwise的匹配任务,是分别得出两个句子对的匹配得分,再来判断标签的。
我这边对齐了示例里的指标,你可以同步测试一下:
_ = pairwise_matching_task.eval(load_best_model=True)
[2021-01-11 16:19:57,867] [ INFO] - Load the best model from ckpt_ernie_pairtwise_matching/best_model /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:1093: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used. warnings.warn(error_info) [2021-01-11 16:20:00,396] [ INFO] - Evaluation on dev dataset start [2021-01-11 16:20:23,087] [ EVAL] - [dev dataset evaluation result] loss=0.46125 acc=0.85986 f1=0.86381 precision=0.85057 recall=0.87747 [step/sec: 44.07] [2021-01-11 16:20:23,088] [ EVAL] - best model saved to ckpt_ernie_pairtwise_matching/best_model [best acc=0.85986]
# Transform dev_examples into text pairs
examples = dataset.dev_examples
text_pairs = []
for example in examples:
# print(example)
text_pairs.append([example.text_a, example.text_b])
text_pairs.append([example.text_a, example.text_c])
# Predict by PaddleHub's API
results = pairwise_matching_task.predict(
data=text_pairs,
max_seq_len=128,
label_list=dataset.get_labels(),
return_result=True,
accelerate_mode=False)
labels = [int(example.label) for example in examples]
preds = []
for index in range(len(labels)):
left_idx, right_idx = index*2, index*2+1
left_score, right_score = results[left_idx][0], results[right_idx][1]
pred = 1 if left_score > right_score else 0
preds.append(pred)
from sklearn.metrics import f1_score, accuracy_score
print(f'acc: {accuracy_score(labels, preds)}')
print(f'f1_score: {f1_score(labels, preds)}')
acc: 0.8598598598598599 f1_score: 0.8638132295719845
results[left_idx],results[right_idx],没有[0],[1],results 是需要返回概率么
results[left_idx],results[right_idx],没有[0],[1],results 是需要返回概率么
需要手动修改paddlehub/finetune/task/matching_task.py
获取概率值。
谢谢 问题已经解决了。 我是想比较 PairwiseTextMatchingTask 和 PointwiseTextMatchingTask 两个模型处理的差异。对于匹配问题的优缺点,这些文档都没有哦
import paddlehub as hub from paddlehub.dataset.base_nlp_dataset import TextMatchingDataset import pandas as pd
class COVID19Competition(TextMatchingDataset): def init(self, tokenizer=None, max_seq_len=None): base_path = 'COVID19_sim_competition' super(COVID19Competition, self).init( is_pair_wise=True, # 文本匹配类型,是否为pairwise base_path=base_path, train_file="/data/qy/hub/COVID19_sim_competition/train.txt", # 相对于base_path的文件路径 dev_file="/data/qy/hub/COVID19_sim_competition/dev.txt", # 相对于base_path的文件路径
test_file="/data/qy/hub/COVID19_sim_competition/test.txt",
module = hub.Module(name="chinese-bert-wwm-ext") inputs, outputs, program = module.context(trainable=True, max_seq_len=128, num_slots=3) tokenizer = hub.BertTokenizer(vocab_file=module.get_vocab_path(), tokenize_chinese_chars=True) dataset = COVID19Competition(tokenizer=tokenizer, max_seq_len=128) strategy = hub.AdamWeightDecayStrategy( weight_decay=0.01, warmup_proportion=0.1, learning_rate=5e-5) config = hub.RunConfig( log_interval= 1000, eval_interval=3000, use_cuda=True, num_epoch=1, batch_size=16, checkpoint_dir='/data/qy/ckpt_ernie_pointwise_matching', strategy=strategy)
%%
query = outputs["sequence_output"] left = outputs['sequence_output_2'] right = outputs['sequence_output_3']
pairwise_matching_task = hub.PairwiseTextMatchingTask( query_feature=query, left_feature=left, right_feature=right, tokenizer=tokenizer, dataset=dataset, config=config, metrics_choices=['f1'], )
%%
run_states=pairwise_matching_task.finetune_and_eval()
%%
result = pairwise_matching_task.predict( data=data, max_seq_len=128, label_list=dataset.get_labels(), return_result=True, load_best_model=True, accelerate_mode=False) df = pd.read_table("/data/qy/hub/COVID19_sim_competition/ dev.txt") df = df[['text_a','text_b','text_c']] data = df.values.tolist()
%%
result = pairwise_matching_task.predict( data=data, max_seq_len=128, label_list=dataset.get_labels(), return_result=True, load_best_model=True, accelerate_mode=False)
%%
pred_true = df['label']
%%
from sklearn.metrics import f1_score,accuracy_score print(f1_score(result,pred_true)) print(accuracy_score(result,pred_true))