PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.11k stars 2.94k forks source link

The process of ranking/ernie_matching is confusing. how to generate .csv file that with the "prob" field. #1920

Closed ArtificialZeng closed 1 year ago

ArtificialZeng commented 2 years ago

https://github.com/ArtificialZeng/PaddleNLP/tree/develop/applications/neural_search/ranking/ernie_matching

You can look at step 7 in ranking/ernie_matching. The prob is already given. But it seems like your Doc hasn't documented how to generate that kind of .csv file with prob.

7. 准备预测数据 待预测数据为 tab 分隔的 tsv 文件,每一行为 1 个文本 Pair,和文本pair的语义索引相似度,部分示例如下:

中西方语言与文化的差异 第二语言习得的一大障碍就是文化差异。 0.5160342454910278 中西方语言与文化的差异 跨文化视角下中国文化对外传播路径琐谈跨文化,中国文化,传播,翻译 0.5145505666732788 中西方语言与文化的差异 从中西方民族文化心理的差异看英汉翻译语言,文化,民族文化心理,思维方式,翻译 0.5141439437866211 中西方语言与文化的差异 中英文化差异对翻译的影响中英文化,差异,翻译的影响 0.5138794183731079 中西方语言与文化的差异 浅谈文化与语言习得文化,语言,文化与语言的关系,文化与语言习得意识,跨文化交际 0.5131710171699524

ArtificialZeng commented 2 years ago

I've tried deleting the last 'prob' field. The ranking file predict.py won't work. PaddleNLP engineer has told me that file is merely a ranking file. But it seems like this project doesn't have a program to generate its own .csv with prob field.

w5688414 commented 2 years ago

Actually, the prob column is not used in the ranking model,and the prob is extracted from the milvus engine, in the ranking model, the query and title are required, you can set the prob column to zero like this: 中西方语言与文化的差异 第二语言习得的一大障碍就是文化差异。 0

MingQuanXu123 commented 2 years ago

Actually, the prob column is not used in the ranking model,and the prob is extracted from the milvus engine, in the ranking model, the query and title are required, you can set the prob column to zero like this: 中西方语言与文化的差异 第二语言习得的一大障碍就是文化差异。 0

但是背景介绍中说道:Pair-wise 匹配模型适合将文本对相似度作为特征之一输入到上层排序模块进行排序的应用场景。第三列应该是文本相似度,如果设置为0,可以理解为并没有利用到文本相似度这一特征吗?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。