PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.09k stars 5.55k forks source link

使用paddleNLP的pipeline做neural_search,想把retriever中的embedding的模型换成自己的模型,报错 #54445

Open nanchi-ibm opened 1 year ago

nanchi-ibm commented 1 year ago

请提出你的问题 Please ask your question

看到官方说可以将retriever中的默认embedding模型换成自己的模型,我先尝试了TaskFlow里自带的embedding模型比如rocketqa-zh-dureader-cross-encoder等,是没有问题的。 但是,当我自己下载了其他模型,模型路径里面也包含了基本的模型文件后,将query_embedding_model更改为新模型的路径就一直报错,这是什么问题呢?

andyjiang1116 commented 1 year ago

你好,能提供下详细报错吗?以及可复现代码

nanchi-ibm commented 1 year ago

你好,能提供下详细报错吗?以及可复现代码

试着用ernie-3.0-nano-zh做embedding,现在的错误是 NotImplementedError

详情: [2023-06-05 17:16:28,815] [ INFO] - start to convert pytorch weight file</Users/winifred/modeltest/ernie-3.0-nano-zh/pytorch_model.bin> to paddle weight file</Users/modeltest/ernie-3.0-nano-zh/model_state.pdparams> ... Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 794, in _load_or_get_component instance = BaseComponent.load_from_args(component_type=component_type, component_params) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/nodes/base.py", line 63, in load_from_args instance = subclass(kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/nodes/retriever/dense.py", line 139, in init pretrained_model = AutoModel.from_pretrained(query_embedding_model) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/auto/modeling.py", line 478, in from_pretrained return cls._from_pretrained(pretrained_model_name_or_path, task, *model_args, *kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/auto/modeling.py", line 324, in _from_pretrained return model_class.from_pretrained(pretrained_model_name_or_path, model_args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 484, in from_pretrained return cls.from_pretrained_v2( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1349, in from_pretrained_v2 model_state_dict = cls.convert(model_weight_file, config, cache_dir) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/conversion_utils.py", line 712, in convert name_mappings = cls._get_name_mappings(config) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/conversion_utils.py", line 749, in _get_name_mappings raise NotImplementedError NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/workspace/kb-chatbot-api/rest_api/application.py", line 29, in from rest_api.controller.router import router as api_router File "/Users/workspace/kb-chatbot-api/./rest_api/controller/router.py", line 17, in from rest_api.controller import file_upload, search, feedback, document File "/Users/workspace/kb-chatbot-api/./rest_api/controller/file_upload.py", line 75, in INDEXING_PIPELINE = Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 265, in load_from_yaml return cls.load_from_config( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 760, in load_from_config component = cls._load_or_get_component(name=name, definitions=component_definitions, components=components) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 797, in _load_or_get_component raise Exception(f"Failed loading pipeline component '{name}': {e}") Exception: Failed loading pipeline component 'Retriever':

w5688414 commented 1 year ago

你好,能提供下详细报错吗?以及可复现代码

试着用ernie-3.0-nano-zh做embedding,现在的错误是 NotImplementedError

详情: [2023-06-05 17:16:28,815] [ INFO] - start to convert pytorch weight file</Users/winifred/modeltest/ernie-3.0-nano-zh/pytorch_model.bin> to paddle weight file</Users/modeltest/ernie-3.0-nano-zh/model_state.pdparams> ... Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 794, in _load_or_get_component instance = BaseComponent.load_from_args(component_type=component_type, component_params) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/nodes/base.py", line 63, in load_from_args instance = subclass(kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/nodes/retriever/dense.py", line 139, in init pretrained_model = AutoModel.from_pretrained(query_embedding_model) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/auto/modeling.py", line 478, in from_pretrained return cls._from_pretrained(pretrained_model_name_or_path, task, *model_args, *kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/auto/modeling.py", line 324, in _from_pretrained return model_class.from_pretrained(pretrained_model_name_or_path, model_args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 484, in from_pretrained return cls.from_pretrained_v2( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1349, in from_pretrained_v2 model_state_dict = cls.convert(model_weight_file, config, cache_dir) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/conversion_utils.py", line 712, in convert name_mappings = cls._get_name_mappings(config) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/conversion_utils.py", line 749, in _get_name_mappings raise NotImplementedError NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/workspace/kb-chatbot-api/rest_api/application.py", line 29, in from rest_api.controller.router import router as api_router File "/Users/workspace/kb-chatbot-api/./rest_api/controller/router.py", line 17, in from rest_api.controller import file_upload, search, feedback, document File "/Users/workspace/kb-chatbot-api/./rest_api/controller/file_upload.py", line 75, in INDEXING_PIPELINE = Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 265, in load_from_yaml return cls.load_from_config( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 760, in load_from_config component = cls._load_or_get_component(name=name, definitions=component_definitions, components=components) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 797, in _load_or_get_component raise Exception(f"Failed loading pipeline component '{name}': {e}") Exception: Failed loading pipeline component 'Retriever':

您好,目前底层是基于taskflow的feature extraction来实现的,如果加载其他非rocketqa的模型,需要您实现一个pipelines的节点来加载ernie3.0抽取向量。具体教程参考。 https://aistudio.baidu.com/aistudio/projectdetail/5011119

nanchi-ibm commented 1 year ago

你好,能提供下详细报错吗?以及可复现代码

试着用ernie-3.0-nano-zh做embedding,现在的错误是 NotImplementedError 详情: [2023-06-05 17:16:28,815] [ INFO] - start to convert pytorch weight file</Users/winifred/modeltest/ernie-3.0-nano-zh/pytorch_model.bin> to paddle weight file</Users/modeltest/ernie-3.0-nano-zh/model_state.pdparams> ... Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 794, in _load_or_get_component instance = BaseComponent.load_from_args(component_type=component_type, component_params) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/nodes/base.py", line 63, in load_from_args instance = subclass(kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/nodes/retriever/dense.py", line 139, in init pretrained_model = AutoModel.from_pretrained(query_embedding_model) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/auto/modeling.py", line 478, in from_pretrained return cls._from_pretrained(pretrained_model_name_or_path, task, *model_args, *kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/auto/modeling.py", line 324, in _from_pretrained return model_class.from_pretrained(pretrained_model_name_or_path, model_args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 484, in from_pretrained return cls.from_pretrained_v2( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1349, in from_pretrained_v2 model_state_dict = cls.convert(model_weight_file, config, cache_dir) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/conversion_utils.py", line 712, in convert name_mappings = cls._get_name_mappings(config) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddlenlp/transformers/conversion_utils.py", line 749, in _get_name_mappings raise NotImplementedError NotImplementedError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/workspace/kb-chatbot-api/rest_api/application.py", line 29, in from rest_api.controller.router import router as api_router File "/Users/workspace/kb-chatbot-api/./rest_api/controller/router.py", line 17, in from rest_api.controller import file_upload, search, feedback, document File "/Users/workspace/kb-chatbot-api/./rest_api/controller/file_upload.py", line 75, in INDEXING_PIPELINE = Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 265, in load_from_yaml return cls.load_from_config( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 760, in load_from_config component = cls._load_or_get_component(name=name, definitions=component_definitions, components=components) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/paddle_pipelines-1.0.0-py3.10.egg/pipelines/pipelines/base.py", line 797, in _load_or_get_component raise Exception(f"Failed loading pipeline component '{name}': {e}") Exception: Failed loading pipeline component 'Retriever':

您好,目前底层是基于taskflow的feature extraction来实现的,如果加载其他非rocketqa的模型,需要您实现一个pipelines的节点来加载ernie3.0抽取向量。具体教程参考。 https://aistudio.baidu.com/aistudio/projectdetail/5011119

非常感谢,我了解下