Open PNightOwlY opened 1 year ago
您好,多路召回在0.5版本才加入,需要您升级成0.5版本后才可以使用。Docker镜像需要按照教程,用最新的Paddle的Docker重新打一个。 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/docker
您好,多路召回在0.5版本才加入,需要您升级成0.5版本后才可以使用。Docker镜像需要按照教程,用最新的Paddle的Docker重新打一个。 https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/docker
谢谢回复!我通过下载paddle的安装包,然后把缺失的环境包都替换了,也成功了!
但是我昨天遇到一个效果问题,我在医疗这个数据集上进行了base和nano的测试,发现base的效果要比nano的效果差,请问这是什么原因呢?
base的配置
version: '1.1.0'
components: # define all the building-blocks for Pipeline
- name: DocumentStore
type: ElasticsearchDocumentStore # consider using MilvusDocumentStore or WeaviateDocumentStore for scaling to large number of documents
params:
host: 172.18.159.16
port: 9200
index: ccks_base_encoder
embedding_dim: 768
- name: Retriever
type: DensePassageRetriever
params:
document_store: DocumentStore # params can reference other components defined in the YAML
top_k: 10
query_embedding_model: rocketqa-zh-base-query-encoder
passage_embedding_model: rocketqa-zh-base-para-encoder
embed_title: False
- name: Ranker # custom-name for the component; helpful for visualization & debugging
type: ErnieRanker # pipelines Class name for the component
params:
model_name_or_path: rocketqa-base-cross-encoder
top_k: 3
- name: TextFileConverter
type: TextConverter
- name: ImageFileConverter
type: ImageToTextConverter
- name: PDFFileConverter
type: PDFToTextConverter
- name: DocxFileConverter
type: DocxToTextConverter
- name: Preprocessor
type: PreProcessor
params:
split_by: word
split_length: 1000
- name: FileTypeClassifier
type: FileTypeClassifier
pipelines:
- name: query # a sample extractive-qa Pipeline
type: Query
nodes:
- name: Retriever
inputs: [Query]
- name: Ranker
inputs: [Retriever]
- name: indexing
type: Indexing
nodes:
- name: FileTypeClassifier
inputs: [File]
- name: TextFileConverter
inputs: [FileTypeClassifier.output_1]
- name: PDFFileConverter
inputs: [FileTypeClassifier.output_2]
- name: DocxFileConverter
inputs: [FileTypeClassifier.output_4]
- name: ImageFileConverter
inputs: [FileTypeClassifier.output_6]
- name: Preprocessor
inputs: [PDFFileConverter, TextFileConverter, DocxFileConverter, ImageFileConverter]
- name: Retriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [Retriever]
nano
version: '1.1.0'
components: # define all the building-blocks for Pipeline
- name: DocumentStore
type: ElasticsearchDocumentStore # consider using MilvusDocumentStore or WeaviateDocumentStore for scaling to large number of documents
params:
host: 172.18.159.16
port: 9200
index: ccks_encoder
embedding_dim: 312
- name: Retriever
type: DensePassageRetriever
params:
document_store: DocumentStore # params can reference other components defined in the YAML
top_k: 10
query_embedding_model: rocketqa-zh-nano-query-encoder
passage_embedding_model: rocketqa-zh-nano-para-encoder
embed_title: False
- name: Ranker # custom-name for the component; helpful for visualization & debugging
type: ErnieRanker # pipelines Class name for the component
params:
model_name_or_path: rocketqa-nano-cross-encoder
top_k: 3
- name: TextFileConverter
type: TextConverter
- name: ImageFileConverter
type: ImageToTextConverter
- name: PDFFileConverter
type: PDFToTextConverter
- name: DocxFileConverter
type: DocxToTextConverter
- name: Preprocessor
type: PreProcessor
params:
split_by: word
split_length: 1000
- name: FileTypeClassifier
type: FileTypeClassifier
pipelines:
- name: query # a sample extractive-qa Pipeline
type: Query
nodes:
- name: Retriever
inputs: [Query]
- name: Ranker
inputs: [Retriever]
- name: indexing
type: Indexing
nodes:
- name: FileTypeClassifier
inputs: [File]
- name: TextFileConverter
inputs: [FileTypeClassifier.output_1]
- name: PDFFileConverter
inputs: [FileTypeClassifier.output_2]
- name: DocxFileConverter
inputs: [FileTypeClassifier.output_4]
- name: ImageFileConverter
inputs: [FileTypeClassifier.output_6]
- name: Preprocessor
inputs: [PDFFileConverter, TextFileConverter, DocxFileConverter, ImageFileConverter]
- name: Retriever
inputs: [Preprocessor]
- name: DocumentStore
inputs: [Retriever]
有具体的数据不?我们评估的是base比nano强,您可以再检查一下
请提出你的问题
docker pull registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda10.2-cudnn7 nvidia-docker run -d --name paddlenlp_pipelines_gpu --net host -ti registry.baidubce.com/paddlepaddle/paddlenlp:2.4.0-gpu-cuda10.2-cudnn7
安装的gpu镜像,pip list | grep paddle 查看paddle的版本为 paddle-bfloat 0.1.7 paddle2onnx 0.9.8 paddlefsl 1.1.0 paddlenlp 2.3.0.dev0 paddleocr 2.5.0.3 paddlepaddle-gpu 2.3.1
运行多路召回的example 无法找到对应的BM25Retriever node