PaddlePaddle / PaddleHub

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固,暂停交互,请耐心等待】
https://www.paddlepaddle.org.cn/hub
Apache License 2.0
12.72k stars 2.08k forks source link

训练时候dev为0.89,然后predict接口预测时候为0.29 #1155

Closed changchend closed 3 years ago

changchend commented 3 years ago

import paddlehub as hub from paddlehub.dataset.base_nlp_dataset import TextMatchingDataset import pandas as pd

class COVID19Competition(TextMatchingDataset): def init(self, tokenizer=None, max_seq_len=None): base_path = 'COVID19_sim_competition' super(COVID19Competition, self).init( is_pair_wise=True, # 文本匹配类型,是否为pairwise base_path=base_path, train_file="/data/qy/hub/COVID19_sim_competition/train.txt", # 相对于base_path的文件路径 dev_file="/data/qy/hub/COVID19_sim_competition/dev.txt", # 相对于base_path的文件路径

test_file="/data/qy/hub/COVID19_sim_competition/test.txt",

        train_file_with_header=True,
        dev_file_with_header=True,
        label_list=["0", "1"],
        tokenizer=tokenizer,
        max_seq_len=max_seq_len)

module = hub.Module(name="chinese-bert-wwm-ext") inputs, outputs, program = module.context(trainable=True, max_seq_len=128, num_slots=3) tokenizer = hub.BertTokenizer(vocab_file=module.get_vocab_path(), tokenize_chinese_chars=True) dataset = COVID19Competition(tokenizer=tokenizer, max_seq_len=128) strategy = hub.AdamWeightDecayStrategy( weight_decay=0.01, warmup_proportion=0.1, learning_rate=5e-5) config = hub.RunConfig( log_interval= 1000, eval_interval=3000, use_cuda=True, num_epoch=1, batch_size=16, checkpoint_dir='/data/qy/ckpt_ernie_pointwise_matching', strategy=strategy)

%%

query = outputs["sequence_output"] left = outputs['sequence_output_2'] right = outputs['sequence_output_3']

pairwise_matching_task = hub.PairwiseTextMatchingTask( query_feature=query, left_feature=left, right_feature=right, tokenizer=tokenizer, dataset=dataset, config=config, metrics_choices=['f1'], )

%%

run_states=pairwise_matching_task.finetune_and_eval()

%%

result = pairwise_matching_task.predict( data=data, max_seq_len=128, label_list=dataset.get_labels(), return_result=True, load_best_model=True, accelerate_mode=False) df = pd.read_table("/data/qy/hub/COVID19_sim_competition/ dev.txt") df = df[['text_a','text_b','text_c']] data = df.values.tolist()

%%

result = pairwise_matching_task.predict( data=data, max_seq_len=128, label_list=dataset.get_labels(), return_result=True, load_best_model=True, accelerate_mode=False)

%%

pred_true = df['label']

%%

from sklearn.metrics import f1_score,accuracy_score print(f1_score(result,pred_true)) print(accuracy_score(result,pred_true))

changchend commented 3 years ago

训练中止过,已经保存了 第一个best_model。

changchend commented 3 years ago

pairwise_matching_task.eval(load_best_model=True) 用eval 接口和train 验证结果一致

changchend commented 3 years ago

results.append(batch_results[0].tolist()[0])

[2020-12-30 17:40:13,630] [ INFO] - PaddleHub predict start [2020-12-30 17:40:13,630] [ INFO] - The best model has been loaded [2020-12-30 17:44:27,030] [ INFO] - PaddleHub predict finished.

KPatr1ck commented 3 years ago

你好,你是指模型在predict的时候的效果变差吗? 麻烦提供一下paddlepaddle和paddlehub的具体版本

changchend commented 3 years ago

你好,你是指模型在predict的时候的效果变差吗? 麻烦提供一下paddlepaddle和paddlehub的具体版本 训练的时候 [dev dataset evaluation result] loss=0.39535 f1=0.93372 [step/sec: 4.69] [2020-12-31 08:54:11,475] [ EVAL] - best model saved to /data/qy/ckpt_ernie_pointwise_matching/best_model [best f1=0.93372] 但是单独我用predict接口预测 dev 数据集时候,结果变差了 paddlepaddle 版本 1.8.0 panddlehub 1.8.1

KPatr1ck commented 3 years ago

你好,你是指模型在predict的时候的效果变差吗? 麻烦提供一下paddlepaddle和paddlehub的具体版本 训练的时候 [dev dataset evaluation result] loss=0.39535 f1=0.93372 [step/sec: 4.69] [2020-12-31 08:54:11,475] [ EVAL] - best model saved to /data/qy/ckpt_ernie_pointwise_matching/best_model [best f1=0.93372] 但是单独我用predict接口预测 dev 数据集时候,结果变差了 paddlepaddle 版本 1.8.0 panddlehub 1.8.1

数据集是你自己整理的吗? 你可以先尝试使用paddlehub 1.8中自带的数据集训练,其他维持不变,看看是否问题依旧存在。 https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.8/demo/pairwise_text_matching

changchend commented 3 years ago

%)7}T%5Z{MD6ZUP_BW3DBD4 F9MKD4O$D`LT( G$J$~TD{H evel 和 predict 的 run_states 不一样是否有影响 list 0 是相同的 1 是不用的

changchend commented 3 years ago

https://aistudio.baidu.com/bdvgpu/user/63646/1413353/notebooks/1413353.ipynb 这是官方的例子, predict后 会降低精度,, 可以后台看看 是不是我什么地方写错了

KPatr1ck commented 3 years ago

https://aistudio.baidu.com/bdvgpu/user/63646/1413353/notebooks/1413353.ipynb 这是官方的例子, predict后 会降低精度,, 可以后台看看 是不是我什么地方写错了

这个链接看不了,方便的话提供我原有aistudio的教程,我看一下问题是否能重现

changchend commented 3 years ago

https://aistudio.baidu.com/aistudio/projectdetail/709472?channelType=0&channel=0 仅把后面预测部分改成了dev.txt image

changchend commented 3 years ago

输入text_a 和 text_b 或 text_a,text_b,text_c 结果一致

KPatr1ck commented 3 years ago

输入text_a 和 text_b 或 text_a,text_b,text_c 结果一致

想问下目前这边是什么样的业务需求用到pairwise的模型呢?想了解一下使用背景和数据的情况。

然后关于issue的问题,我这边复现你的情况了,首先要说明的是,predict接口只会将列表的前两个元素作为text_atext_b,所以你的文本里有没有text_c都是一样的结果;而且pairwise的匹配任务,是分别得出两个句子对的匹配得分,再来判断标签的。

我这边对齐了示例里的指标,你可以同步测试一下:

训练过程中对dev数据集的评估

_ = pairwise_matching_task.eval(load_best_model=True)

[2021-01-11 16:19:57,867] [ INFO] - Load the best model from ckpt_ernie_pairtwise_matching/best_model /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:1093: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used. warnings.warn(error_info) [2021-01-11 16:20:00,396] [ INFO] - Evaluation on dev dataset start [2021-01-11 16:20:23,087] [ EVAL] - [dev dataset evaluation result] loss=0.46125 acc=0.85986 f1=0.86381 precision=0.85057 recall=0.87747 [step/sec: 44.07] [2021-01-11 16:20:23,088] [ EVAL] - best model saved to ckpt_ernie_pairtwise_matching/best_model [best acc=0.85986]

手工调用predict接口对dev数据集的评估

# Transform dev_examples into text pairs
examples = dataset.dev_examples
text_pairs = []
for example in examples:
    # print(example)
    text_pairs.append([example.text_a, example.text_b])
    text_pairs.append([example.text_a, example.text_c])

# Predict by PaddleHub's API
results = pairwise_matching_task.predict(
    data=text_pairs,
    max_seq_len=128,
    label_list=dataset.get_labels(),
    return_result=True,
    accelerate_mode=False)

labels = [int(example.label) for example in examples]
preds = []
for index in range(len(labels)):
    left_idx, right_idx = index*2, index*2+1
    left_score, right_score = results[left_idx][0], results[right_idx][1]    
    pred = 1 if left_score > right_score else 0
    preds.append(pred)

from sklearn.metrics import f1_score, accuracy_score
print(f'acc: {accuracy_score(labels, preds)}')
print(f'f1_score: {f1_score(labels, preds)}')

acc: 0.8598598598598599 f1_score: 0.8638132295719845

changchend commented 3 years ago

image results[left_idx],results[right_idx],没有[0],[1],results 是需要返回概率么

KPatr1ck commented 3 years ago

image results[left_idx],results[right_idx],没有[0],[1],results 是需要返回概率么

image

需要手动修改paddlehub/finetune/task/matching_task.py获取概率值。

changchend commented 3 years ago

谢谢 问题已经解决了。 我是想比较 PairwiseTextMatchingTask 和 PointwiseTextMatchingTask 两个模型处理的差异。对于匹配问题的优缺点,这些文档都没有哦