changchend commented 3 years ago

hub，1.8.1 之前有保存ckpt_ernie_pointwise_matching，， text_pairs = [["这家餐厅很好吃", "这部电影真的很差劲"]]

print(pointwise_matching_task.predict( data=text_pairs, max_seq_len=128, label_list=dataset.get_labels(), return_result=True, load_best_model=True, accelerate_mode=False)) 运行所有程序，不单单这个预测接口，，预测结果会改变，概率会变

changchend commented 3 years ago

(['0'], [[0.5246255993843079, 0.47537437081336975]]) (['0'], [[0.6127048134803772, 0.387295126914978]]) (['1'], [[0.26533350348472595, 0.7346665263175964]])

changchend commented 3 years ago

[2020-12-23 16:08:12,333] [ INFO] - Installing chinese-bert-wwm-ext module [2020-12-23 16:08:12,450] [ INFO] - Module chinese-bert-wwm-ext already installed in /root/.paddlehub/modules/chinese_bert_wwm_ext W1223 16:08:17.359491 18447 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.0 W1223 16:08:17.365764 18447 device_context.cc:260] device: 0, cuDNN Version: 7.6. [2020-12-23 16:08:49,074] [ INFO] - Checkpoint dir: ckpt_ernie_pointwise_matching [2020-12-23 16:08:49,571] [ INFO] - PaddleHub predict start [2020-12-23 16:08:49,571] [ INFO] - Load the best model from ckpt_ernie_pointwise_matching/best_model /home/software/Anaconda/anaconda3/envs/qy/lib/python3.7/site-packages/paddle/fluid/executor.py:1093: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used. warnings.warn(error_info) [2020-12-23 16:08:51,065] [ INFO] - Try loading checkpoint from ckpt_ernie_pointwise_matching/ckpt.meta [2020-12-23 16:08:51,065] [ INFO] - PaddleHub model checkpoint not found, start from scratch... [2020-12-23 16:08:51,533] [ INFO] - PaddleHub predict finished. (['1'], [[0.26533350348472595, 0.7346665263175964]])

changchend commented 3 years ago

import paddlehub as hub from paddlehub.dataset.base_nlp_dataset import TextMatchingDataset

class COVID19Competition(TextMatchingDataset): def init(self, tokenizer=None, max_seq_len=None): base_path = 'COVID19_sim_competition' super(COVID19Competition, self).init( is_pair_wise=False, # 文本匹配类型，是否为pairwise base_path=base_path, train_file="/data/qy/hub/COVID19_sim_competition/train.txt", # 相对于base_path的文件路径 dev_file="/data/qy/hub/COVID19_sim_competition/dev.txt", # 相对于base_path的文件路径 train_file_with_header=True, dev_file_with_header=True, label_list=["0", "1"], tokenizer=tokenizer, max_seq_len=max_seq_len)

module = hub.Module(name="chinese-bert-wwm-ext")

Pointwise任务需要: query, title_left (2 slots)

inputs, outputs, program = module.context(trainable=True, max_seq_len=128, num_slots=2)

tokenizer = hub.BertTokenizer(vocab_file=module.get_vocab_path(), tokenize_chinese_chars=True)

dataset = COVID19Competition(tokenizer=tokenizer, max_seq_len=128)

strategy = hub.L2SPFinetuneStrategy( learning_rate=5e-5, optimizer_name="adam", regularization_coeff=1e-3)

config = hub.RunConfig( log_interval= 1000, eval_interval=3000, use_cuda=True, num_epoch=1, batch_size=32, checkpoint_dir='ckpt_ernie_pointwise_matching', strategy=strategy)

构建迁移网络，使用ERNIE的token-level输出

query = outputs["sequence_output"] title = outputs['sequence_output_2']

创建pointwise文本匹配任务

pointwise_matching_task = hub.PointwiseTextMatchingTask( dataset=dataset, query_feature=query, title_feature=title, tokenizer=tokenizer, config=config, metrics_choices=['f1'],)

预测数据样例

text_pairs = [["这家餐厅很好吃", "这部电影真的很差劲"]]

print(pointwise_matching_task.predict( data=text_pairs, max_seq_len=128, label_list=dataset.get_labels(), return_result=True, load_best_model=True, accelerate_mode=False))

KPatr1ck commented 3 years ago

[2020-12-23 16:08:12,333] [ INFO] - Installing chinese-bert-wwm-ext module [2020-12-23 16:08:12,450] [ INFO] - Module chinese-bert-wwm-ext already installed in /root/.paddlehub/modules/chinese_bert_wwm_ext W1223 16:08:17.359491 18447 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.2, Runtime API Version: 10.0 W1223 16:08:17.365764 18447 device_context.cc:260] device: 0, cuDNN Version: 7.6. [2020-12-23 16:08:49,074] [ INFO] - Checkpoint dir: ckpt_ernie_pointwise_matching [2020-12-23 16:08:49,571] [ INFO] - PaddleHub predict start [2020-12-23 16:08:49,571] [ INFO] - Load the best model from ckpt_ernie_pointwise_matching/best_model /home/software/Anaconda/anaconda3/envs/qy/lib/python3.7/site-packages/paddle/fluid/executor.py:1093: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used. warnings.warn(error_info) [2020-12-23 16:08:51,065] [ INFO] - Try loading checkpoint from ckpt_ernie_pointwise_matching/ckpt.meta [2020-12-23 16:08:51,065] [ INFO] - PaddleHub model checkpoint not found, start from scratch... [2020-12-23 16:08:51,533] [ INFO] - PaddleHub predict finished. (['1'], [[0.26533350348472595, 0.7346665263175964]])

[2020-12-23 16:08:51,065] [ INFO] - Try loading checkpoint from ckpt_ernie_pointwise_matching/ckpt.meta
[2020-12-23 16:08:51,065] [ INFO] - PaddleHub model checkpoint not found, start from scratch...

这里提示了模型没有成功加载，所以你加载的模型，下游的网络参数是随机初始化的建议查一下checkpoint的路径是否填写正确。

changchend commented 3 years ago

[2020-12-23 16：08：12,333] [INFO]-安装chinese-bert-wwm-ext模块 [2020-12-23 16：08：12,450] [INFO]-已经安装了chinese-bert-wwm-ext模块安装在/root/.paddlehub/modules/chinese_bert_wwm_ext W1223 16：08：17.359491 18447 device_context.cc:252]请注意：设备：0，CUDA功能：61，驱动程序API版本：10.2，运行时API版本：10.0 W1223 16： 08：17.365764 18447 device_context.cc:260]设备：0，cuDNN版本：7.6。 [2020-12-23 16：08：49,074] [信息]-检查点目录：ckpt_ernie_pointwise_matching [2020-12-23 16：08：49,571] [信息]-PaddleHub预测开始时间 [2020-12-23 16：08：49,571 ] [信息]-从ckpt_ernie_pointwise_matching / best_model加载最佳模型 /home/software/Anaconda/anaconda3/envs/qy/lib/python3.7/site-packages/paddle/fluid/executor.py:1093：用户警告：程序中没有要执行的运算符。如果您手动传递程序，请使用fluid.program_guard来确保正在使用当前程序。 warnings.warn（error_info） [2020-12-23 16：08：51,065] [信息]-尝试从ckpt_ernie_pointwise_matching / ckpt.meta [2020-12-23 16：08：51,065]加载检查点[信息]-PaddleHub模型检查点找不到，从头开始... [2020-12-23 16：08：51,533] [信息]-PaddleHub预测完成。（['1']，[[0.26533350348472595，0.7346665263175964]]）
[2020-12-23 16:08:51,065] [ INFO] - Try loading checkpoint from ckpt_ernie_pointwise_matching/ckpt.meta
[2020-12-23 16:08:51,065] [ INFO] - PaddleHub model checkpoint not found, start from scratch...
这里提示了模型没有成功加载，所以你加载的模型，下游的网络参数是随机初始化的建议查一下checkpoint的路径是否正确填写。

ok,thank you

PaddlePaddle / PaddleHub

每次运行程序结果不一致 #1138

Pointwise任务需要: query, title_left (2 slots)

构建迁移网络，使用ERNIE的token-level输出

创建pointwise文本匹配任务

预测数据样例