PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.11k stars 2.94k forks source link

运行dense_faq_example.py是发生AssertionError #3243

Closed Ywandung-Lyou closed 2 years ago

Ywandung-Lyou commented 2 years ago

在Win11安装PaddlePaddle 2.3.2,并运行python examples/question-answering/dense_qa_example.py --device cpupython examples/frequently-asked-question/dense_faq_example.py --device cpu,出现了AssertionError。而该问题在运行python examples/semantic-search/semantic_search_example.py --device cpu时没出现。具体报错如下:

(ml) PS C:\Users\my_name\Desktop\PaddleNLP\pipelines> python examples/question-answering/dense_qa_example.py --device cpu
INFO - pipelines.document_stores.faiss -  document_cnt:1398     embedding_cnt:1398
INFO - pipelines.utils.common_utils -  Using devices: PLACE(CPU)
INFO - pipelines.utils.common_utils -  Number of GPUs: 0
[2022-09-11 10:38:14,283] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa_zh_dureader_query_encoder.pdparams
[2022-09-11 10:38:23,504] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa_zh_dureader_query_encoder.pdparams
[2022-09-11 10:38:32,573] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-dureader-query-encoder'.
[2022-09-11 10:38:32,574] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa-zh-dureader-vocab.txt
[2022-09-11 10:38:32,583] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\tokenizer_config.json
[2022-09-11 10:38:32,584] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\special_tokens_map.json
[2022-09-11 10:38:32,585] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-dureader-query-encoder'.
[2022-09-11 10:38:32,585] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa-zh-dureader-vocab.txt
[2022-09-11 10:38:32,594] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\tokenizer_config.json
[2022-09-11 10:38:32,594] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\special_tokens_map.json
INFO - pipelines.utils.logger -  Logged parameters:
 {'processor': 'TextSimilarityProcessor', 'tokenizer': 'NoneType', 'max_seq_len': '0', 'dev_split': '0.1'}
INFO - pipelines.utils.common_utils -  Using devices: PLACE(CPU)
INFO - pipelines.utils.common_utils -  Number of GPUs: 0
Loading Parameters from:rocketqa-zh-dureader-cross-encoder
[2022-09-11 10:38:32,596] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\rocketqa_zh_dureader_cross_encoder.pdparams
[2022-09-11 10:38:41,647] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-dureader-cross-encoder'.
[2022-09-11 10:38:41,647] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\rocketqa-zh-dureader-vocab.txt
[2022-09-11 10:38:41,656] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\tokenizer_config.json
[2022-09-11 10:38:41,656] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\special_tokens_map.json
INFO - pipelines.utils.common_utils -  Using devices: PLACE(CPU)
INFO - pipelines.utils.common_utils -  Number of GPUs: 0
[2022-09-11 10:38:41,663] [    INFO] - We are using <class 'paddlenlp.transformers.ernie_gram.modeling.ErnieGramForQuestionAnswering'> to load 'ernie-gram-zh-finetuned-dureader-robust'.
[2022-09-11 10:38:41,663] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\ernie-gram-zh-finetuned-dureader-robust\model_state.pdparams
[2022-09-11 10:38:53,865] [    INFO] - We are using <class 'paddlenlp.transformers.ernie_gram.tokenizer.ErnieGramTokenizer'> to load 'ernie-gram-zh-finetuned-dureader-robust'.
[2022-09-11 10:38:53,865] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\ernie-gram-zh-finetuned-dureader-robust\vocab.txt
[2022-09-11 10:38:53,865] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\ernie-gram-zh-finetuned-dureader-robust\tokenizer_config.json
[2022-09-11 10:38:53,865] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\ernie-gram-zh-finetuned-dureader-robust\special_tokens_map.json
INFO - pipelines.utils.logger -  Logged parameters:
 {'processor': 'SquadProcessor', 'tokenizer': 'ErnieGramTokenizer', 'max_seq_len': '256', 'dev_split': '0'}
Traceback (most recent call last):
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\base.py", line 466, in run
    "component"]._dispatch_run(**node_input)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\base.py", line 159, in _dispatch_run
    output, stream = self.run(**run_inputs, **run_params)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 119, in run
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 95, in wrapper
    ret = fn(*args, **kwargs)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 140, in run_query
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\dense.py", line 206, in retrieve
    return_embedding=False)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\document_stores\faiss.py", line 669, in query_by_embedding
    query_emb, top_k)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\faiss\__init__.py", line 308, in replacement_search
    assert d == self.d
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/question-answering/dense_qa_example.py", line 118, in <module>
    dense_qa_pipeline()
  File "examples/question-answering/dense_qa_example.py", line 101, in dense_qa_pipeline
    prediction = pipe.run(query="北京市有多少个行政区?", params=pipeline_params)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\standard_pipelines.py", line 207, in run
    output = self.pipeline.run(query=query, params=params, debug=debug)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\base.py", line 470, in run
    f"Exception while running node `{node_id}` with input `{node_input}`: {e}, full stack trace: {tb}"
Exception: Exception while running node `Retriever` with input `{'root_node': 'Query', 'params': {'Retriever': {'top_k': 50}, 'Ranker': {'top_k': 1}, 'Reader': {'top_k': 1}}, 'query': '北京市有多少个行政区?', 'node_id': 'Retriever'}`: , full stack trace: Traceback (most recent call last):
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\base.py", line 466, in run
    "component"]._dispatch_run(**node_input)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\base.py", line 159, in _dispatch_run
    output, stream = self.run(**run_inputs, **run_params)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 119, in run
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 95, in wrapper
    ret = fn(*args, **kwargs)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 140, in run_query
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\dense.py", line 206, in retrieve
    return_embedding=False)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\document_stores\faiss.py", line 669, in query_by_embedding
    query_emb, top_k)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\faiss\__init__.py", line 308, in replacement_search
    assert d == self.d
AssertionError

(ml) PS C:\Users\my_name\Desktop\PaddleNLP\pipelines> python examples/frequently-asked-question/dense_faq_example.py --device cpu
INFO - pipelines.document_stores.faiss -  document_cnt:1398     embedding_cnt:1398
INFO - pipelines.utils.common_utils -  Using devices: PLACE(CPU)
INFO - pipelines.utils.common_utils -  Number of GPUs: 0
[2022-09-11 10:40:40,382] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa_zh_dureader_query_encoder.pdparams
[2022-09-11 10:40:49,457] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa_zh_dureader_query_encoder.pdparams
[2022-09-11 10:40:58,598] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-dureader-query-encoder'.
[2022-09-11 10:40:58,599] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa-zh-dureader-vocab.txt
[2022-09-11 10:40:58,612] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\tokenizer_config.json
[2022-09-11 10:40:58,613] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\special_tokens_map.json
[2022-09-11 10:40:58,616] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-dureader-query-encoder'.
[2022-09-11 10:40:58,617] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\rocketqa-zh-dureader-vocab.txt
[2022-09-11 10:40:58,630] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\tokenizer_config.json
[2022-09-11 10:40:58,631] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-query-encoder\special_tokens_map.json
INFO - pipelines.utils.logger -  Logged parameters:
 {'processor': 'TextSimilarityProcessor', 'tokenizer': 'NoneType', 'max_seq_len': '0', 'dev_split': '0.1'}
INFO - pipelines.utils.common_utils -  Using devices: PLACE(CPU)
INFO - pipelines.utils.common_utils -  Number of GPUs: 0
Loading Parameters from:rocketqa-zh-dureader-cross-encoder
[2022-09-11 10:40:58,634] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\rocketqa_zh_dureader_cross_encoder.pdparams
[2022-09-11 10:41:07,485] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-dureader-cross-encoder'.
[2022-09-11 10:41:07,485] [    INFO] - Already cached C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\rocketqa-zh-dureader-vocab.txt
[2022-09-11 10:41:07,501] [    INFO] - tokenizer config file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\tokenizer_config.json
[2022-09-11 10:41:07,501] [    INFO] - Special tokens file saved in C:\Users\my_name\.paddlenlp\models\rocketqa-zh-dureader-cross-encoder\special_tokens_map.json
Traceback (most recent call last):
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\base.py", line 466, in run
    "component"]._dispatch_run(**node_input)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\base.py", line 159, in _dispatch_run
    output, stream = self.run(**run_inputs, **run_params)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 119, in run
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 95, in wrapper
    ret = fn(*args, **kwargs)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 140, in run_query
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\dense.py", line 206, in retrieve
    return_embedding=False)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\document_stores\faiss.py", line 669, in query_by_embedding
    query_emb, top_k)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\faiss\__init__.py", line 308, in replacement_search
    assert d == self.d
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/frequently-asked-question/dense_faq_example.py", line 92, in <module>
    dense_faq_pipeline()
  File "examples/frequently-asked-question/dense_faq_example.py", line 86, in dense_faq_pipeline
    prediction = pipe.run(query="企业如何办理养老保险", params=pipeline_params)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\standard_pipelines.py", line 244, in run
    output = self.pipeline.run(query=query, params=params, debug=debug)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\base.py", line 470, in run
    f"Exception while running node `{node_id}` with input `{node_input}`: {e}, full stack trace: {tb}"
Exception: Exception while running node `Retriever` with input `{'root_node': 'Query', 'params': {'Retriever': {'top_k': 50}, 'Ranker': {'top_k': 1}}, 'query': '企业如何办理养老保险', 'node_id': 'Retriever'}`: , full stack trace: Traceback (most recent call last):
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\pipelines\base.py", line 466, in run
    "component"]._dispatch_run(**node_input)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\base.py", line 159, in _dispatch_run
    output, stream = self.run(**run_inputs, **run_params)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 119, in run
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 95, in wrapper
    ret = fn(*args, **kwargs)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\base.py", line 140, in run_query
    headers=headers)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\nodes\retriever\dense.py", line 206, in retrieve
    return_embedding=False)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\pipelines\document_stores\faiss.py", line 669, in query_by_embedding
    query_emb, top_k)
  File "D:\Programs\miniconda3\envs\ml\lib\site-packages\faiss\__init__.py", line 308, in replacement_search
    assert d == self.d
AssertionError
w5688414 commented 2 years ago

运行其他应用,请删除原来的db文件:

rm -rf faiss_document_store.db

然后再试试