[Question]: 端到端语义检索系统,基于DuReader-Robust数据集搭建语义检索系统的代码示例 执行失败 #4948

Closed huangbo-bo closed 6 months ago

huangbo-bo commented 1 year ago


执行命令 python examples/semantic-search/ --device cpu --search_engine faiss

报错 INFO - pipelines.utils.preprocessing - Converting data/dureader_dev/dureader_dev/dev250.txt INFO - pipelines.utils.preprocessing - Converting data/dureader_dev/dureader_dev/dev716.txt INFO - pipelines.utils.preprocessing - Converting data/dureader_dev/dureader_dev/dev62.txt INFO - pipelines.utils.preprocessing - Converting data/dureader_dev/dureader_dev/dev1165.txt INFO - pipelines.document_stores.faiss - document_cnt:0 embedding_cnt:0 Writing Documents: 2000it [00:04, 492.30it/s]
INFO - pipelines.utils.common_utils - Using devices: PLACE(CPU) INFO - pipelines.utils.common_utils - Number of GPUs: 0 Traceback (most recent call last): File "examples/semantic-search/", line 185, in semantic_search_tutorial() File "examples/semantic-search/", line 164, in semantic_search_tutorial retriever = get_faiss_retriever(use_gpu) File "examples/semantic-search/", line 87, in get_faiss_retriever embed_title=False, File "/usr/local/python3/lib/python3.7/site-packages/pipelines/nodes/retriever/", line 154, in init share_parameters=share_parameters, TypeError: init() got an unexpected keyword argument 'output_emb_size'

环境 系统:CentOS Linux release 7.9.2009 (Core) python:3.7.16 pip:23.0.1

w5688414 commented 1 year ago


huangbo-bo commented 1 year ago


升级paddlenlp至2.5.1后, 又报其他的错了

INFO - pipelines.utils.preprocessing - Converting data/dureader_dev/dureader_dev/dev1165.txt INFO - pipelines.document_stores.faiss - document_cnt:0 embedding_cnt:0 Writing Documents: 2000it [00:03, 518.72it/s]
INFO - pipelines.utils.common_utils - Using devices: PLACE(CPU) INFO - pipelines.utils.common_utils - Number of GPUs: 0 [2023-02-23 09:31:59,589] [ INFO] - Model config ErnieConfig { "attention_probs_dropout_prob": 0.1, "enable_recompute": false, "fuse": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 312, "initializer_range": 0.02, "intermediate_size": 1248, "layer_norm_eps": 1e-12, "max_position_embeddings": 2048, "model_type": "ernie", "num_attention_heads": 12, "num_hidden_layers": 4, "pad_token_id": 0, "paddlenlp_version": null, "pool_act": "tanh", "task_id": 0, "task_type_vocab_size": 16, "type_vocab_size": 4, "use_task_id": true, "vocab_size": 40000 }

[2023-02-23 09:31:59,589] [ INFO] - Found /root/.paddlenlp/models/rocketqa-zh-nano-query-encoder/rocketqa-zh-nano-query-encoder.pdparams Traceback (most recent call last): File "examples/semantic-search/", line 185, in semantic_search_tutorial() File "examples/semantic-search/", line 164, in semantic_search_tutorial retriever = get_faiss_retriever(use_gpu) File "examples/semantic-search/", line 87, in get_faiss_retriever embed_title=False, File "/usr/local/python3/lib/python3.7/site-packages/pipelines/nodes/retriever/", line 154, in init share_parameters=share_parameters, File "/usr/local/python3/lib/python3.7/site-packages/paddlenlp/transformers/semantic_search/", line 97, in init self.query_ernie = ErnieEncoder.from_pretrained(query_model_name_or_path, output_emb_size=output_emb_size) File "/usr/local/python3/lib/python3.7/site-packages/paddlenlp/transformers/", line 486, in from_pretrained pretrained_model_name_or_path, from_hf_hub=from_hf_hub, subfolder=subfolder, *args, kwargs File "/usr/local/python3/lib/python3.7/site-packages/paddlenlp/transformers/", line 1346, in from_pretrained_v2 support_conversion=support_conversion, File "/usr/local/python3/lib/python3.7/site-packages/paddlenlp/transformers/", line 1026, in _resolve_model_file_path weight_file_path = get_path_from_url_with_filelock(pretrained_model_name_or_path, cache_dir) File "/usr/local/python3/lib/python3.7/site-packages/paddlenlp/utils/", line 167, in get_path_from_url_with_filelock result = get_path_from_url(url=url, root_dir=root_dir, md5sum=md5sum, check_exist=check_exist) File "/usr/local/python3/lib/python3.7/site-packages/paddlenlp/utils/", line 134, in get_path_from_url if tarfile.is_tarfile(fullpath) or zipfile.is_zipfile(fullpath): File "/usr/local/python3/lib/python3.7/", line 2442, in is_tarfile t = open(name) File "/usr/local/python3/lib/python3.7/", line 1575, in open return func(name, "r", fileobj, kwargs) File "/usr/local/python3/lib/python3.7/", line 1702, in xzopen t = cls.taropen(name, mode, fileobj, kwargs) File "/usr/local/python3/lib/python3.7/", line 1623, in taropen return cls(name, mode, fileobj, kwargs) File "/usr/local/python3/lib/python3.7/", line 1486, in init self.firstmember = File "/usr/local/python3/lib/python3.7/", line 2289, in next tarinfo = self.tarinfo.fromtarfile(self) File "/usr/local/python3/lib/python3.7/", line 1094, in fromtarfile buf = File "/usr/local/python3/lib/python3.7/", line 206, in read return File "/usr/local/python3/lib/python3.7/", line 68, in readinto data = File "/usr/local/python3/lib/python3.7/", line 96, in read if self._decompressor.needs_input: AttributeError: '_lzma.LZMADecompressor' object has no attribute 'needs_input'

w5688414 commented 1 year ago


rm -rf ~/.paddlenlp