PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.11k stars 2.94k forks source link

[Question]: AttributeError: module 'faiss' has no attribute 'swigfaiss_avx2' #6815

Closed simon7073 closed 1 year ago

simon7073 commented 1 year ago

请提出你的问题

我的目的

我在尝试复现 端到端两路召回语义检索系统 的步骤。

我的环境

(注:文档中未提到需要安装 rustc / cmake ,我在安装环境时根据某报错信息独立安装,暂且不论)

nvidia-smi | findstr /n "^" | findstr /b "2: 3: 4:"
#2:+---------------------------------------------------------------------------------------+
#3:| NVIDIA-SMI 536.99                 Driver Version: 536.99       CUDA Version: 12.2     |
#4:|-----------------------------------------+----------------------+----------------------+

nvcc -V
#nvcc: NVIDIA (R) Cuda compiler driver
#Copyright (c) 2005-2023 NVIDIA Corporation
#Built on Tue_Jul_11_03:10:21_Pacific_Daylight_Time_2023
#Cuda compilation tools, release 12.2, V12.2.128
#Build cuda_12.2.r12.2/compiler.33053471_0
`pip list`

```powershell Package Version ------------------------------ ----------- aiohttp 3.8.5 aiosignal 1.3.1 altair 5.0.1 anyio 3.7.1 astor 0.8.1 async-timeout 4.0.3 attrdict 2.0.1 attrs 23.1.0 Babel 2.12.1 bce-python-sdk 0.8.90 beautifulsoup4 4.12.2 blinker 1.6.2 blis 0.7.10 boilerpy3 1.0.6 Brotli 1.0.9 cachetools 5.3.1 catalogue 2.0.9 certifi 2023.7.22 cffi 1.15.1 charset-normalizer 3.2.0 click 8.0.0 colorama 0.4.6 colorlog 6.7.0 confection 0.1.1 contourpy 1.1.0 cryptography 41.0.3 cssselect 1.2.0 cssutils 2.7.1 cycler 0.11.0 cymem 2.0.7 Cython 3.0.0 datasets 2.14.4 decorator 5.1.1 dill 0.3.4 elasticsearch 7.11.0 environs 9.5.0 et-xmlfile 1.1.0 Events 0.5 exceptiongroup 1.1.3 faiss-cpu 1.7.2 fastapi 0.101.1 filelock 3.12.2 fire 0.5.0 Flask 2.2.5 flask-babel 3.1.0 fonttools 4.42.1 frozenlist 1.4.0 fsspec 2023.6.0 future 0.18.3 gevent 23.7.0 geventhttpclient 2.0.2 gitdb 4.0.10 GitPython 3.1.32 greenlet 2.0.2 grpcio 1.56.0 h11 0.14.0 htbuilder 0.6.1 httpcore 0.17.3 httpx 0.24.1 huggingface-hub 0.16.4 idna 3.4 imageio 2.31.1 imgaug 0.4.0 importlib-metadata 6.8.0 importlib-resources 6.0.1 itsdangerous 2.1.2 jieba 0.42.1 Jinja2 3.1.2 joblib 1.3.2 jsonschema 4.19.0 jsonschema-specifications 2023.7.1 kiwisolver 1.4.4 langcodes 3.3.0 langdetect 1.0.9 lazy_loader 0.3 llvmlite 0.40.1 lmdb 1.4.1 lxml 4.9.3 Markdown 3.4.4 markdown-it-py 3.0.0 MarkupSafe 2.1.3 marshmallow 3.20.1 matplotlib 3.7.2 mdurl 0.1.2 mmh3 4.0.1 more-itertools 10.1.0 multidict 6.0.4 multiprocess 0.70.12.2 murmurhash 1.0.9 networkx 3.1 nltk 3.8.1 numba 0.57.1 numpy 1.24.4 onnx 1.14.0 opencv-contrib-python 4.6.0.66 opencv-contrib-python-headless 4.8.0.76 opencv-python 4.6.0.66 openpyxl 3.1.2 opt-einsum 3.3.0 packaging 23.1 paddle-bfloat 0.1.7 paddle-pipelines 0.6.0 paddle2onnx 1.0.6 paddlefsl 1.1.0 paddlenlp 2.6.0 paddleocr 2.6.1.3 paddlepaddle 2.5.1 pandas 2.0.3 pathy 0.10.2 pdf2docx 0.5.6 pdf2image 1.16.3 pdfminer.six 20221105 pdfplumber 0.10.2 Pillow 10.0.0 pip 23.2.1 premailer 3.10.0 preshed 3.0.8 protobuf 3.20.2 psutil 5.9.5 pyarrow 13.0.0 pyclipper 1.3.0.post4 pycparser 2.21 pycryptodome 3.18.0 pydantic 1.10.12 pydeck 0.8.1b0 Pygments 2.16.1 pymilvus 2.3.0 Pympler 1.0.1 PyMuPDF 1.20.2 pyparsing 3.0.9 pypdfium2 4.18.0 python-dateutil 2.8.2 python-docx 0.8.11 python-dotenv 1.0.0 python-multipart 0.0.6 python-rapidjson 1.10 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0.1 rapidfuzz 3.2.0 rarfile 4.0 referencing 0.30.2 regex 2023.8.8 requests 2.31.0 rich 13.5.2 rpds-py 0.9.2 safetensors 0.3.3 scikit-image 0.21.0 scikit-learn 1.3.0 scipy 1.11.2 semver 3.0.1 sentencepiece 0.1.99 seqeval 1.2.2 setuptools 68.1.2 shapely 2.0.1 six 1.16.0 smart-open 6.3.0 smmap 5.0.0 sniffio 1.3.0 soupsieve 2.4.1 spacy 3.6.1 spacy-legacy 3.0.12 spacy-loggers 1.0.4 SQLAlchemy 1.4.49 SQLAlchemy-Utils 0.41.1 srsly 2.4.7 sseclient-py 1.7.2 st-annotated-text 4.0.0 starlette 0.27.0 streamlit 1.11.1 termcolor 2.3.0 thinc 8.1.12 threadpoolctl 3.2.0 tifffile 2023.8.12 toml 0.10.2 toolz 0.12.0 tornado 6.3.3 tqdm 4.66.1 tritonclient 2.36.0 typer 0.9.0 typing_extensions 4.5.0 tzdata 2023.3 tzlocal 5.0.1 ujson 5.8.0 urllib3 1.26.16 uvicorn 0.23.2 validators 0.21.2 visualdl 2.5.3 wasabi 1.1.2 watchdog 3.0.0 Werkzeug 2.3.7 wheel 0.41.2 wordcloud 1.8.2.2 xxhash 3.3.0 yarl 1.9.2 zipp 3.16.2 zope.event 5.0 zope.interface 6.0 ```

我的安装步骤

# 创建新环境
mamba create -n multi_recall_39 python=3.9 -y
conda activate multi_recall_39

# 下载 PaddleNLP 源码
cd D:\ProgramData
git clone https://github.com/PaddlePaddle/PaddleNLP.git

# 
conda activate multi_recall_39
cd D:\ProgramData\PaddleNLP\pipelines

# 在 requirements.txt 文件中添加 paddlepaddle-gpu
# 安装 依赖包
python -m pip install  -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple -i https://mirror.baidu.com/pypi/simple -i https://pypi.org/simple -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

# 安装 pipelines (本地安装)
python setup.py install

# 检查 paddle 是否安装完成
python -c "import paddle;paddle.utils.run_check()"
#Running verify PaddlePaddle program ...
#I0824 13:33:55.042425 15380 interpretercore.cc:237] New Executor is Running.
#I0824 13:33:55.232187 15380 interpreter_util.cc:518] Standalone Executor is Used.
#PaddlePaddle works well on 1 CPU.
#PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

$env:CUDA_VISIBLE_DEVICES=0
python examples/semantic-search/multi_recall_semantic_search_example.py --device cpu --search_engine elastic # 此语句执行报错

报错信息

(multi_recall_39) PS D:\ProgramData\PaddleNLP\pipelines> python examples/semantic-search/multi_recall_semantic_search_example.py --device cpu --search_engine elastic
D:\.conda\envs\multi_recall_39\lib\site-packages\_distutils_hack\__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Traceback (most recent call last):
  File "D:\ProgramData\PaddleNLP\pipelines\examples\semantic-search\multi_recall_semantic_search_example.py", line 17, in <module>
    from pipelines.document_stores import (
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\__init__.py", line 39, in <module>
    from pipelines import utils
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\utils\__init__.py", line 15, in <module>
    from pipelines.utils.preprocessing import (  # isort: skip
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\utils\preprocessing.py", line 20, in <module>
    from pipelines.nodes.base import BaseComponent
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\nodes\__init__.py", line 48, in <module>
    from pipelines.nodes.reader import BaseReader, ErnieReader
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\nodes\reader\__init__.py", line 16, in <module>
    from pipelines.nodes.reader.ernie_dureader import ErnieReader
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\nodes\reader\ernie_dureader.py", line 28, in <module>
    from pipelines.document_stores import BaseDocumentStore
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\document_stores\__init__.py", line 40, in <module>
    FAISSDocumentStore = safe_import("pipelines.document_stores.faiss", "FAISSDocumentStore", "faiss")
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\utils\import_utils.py", line 39, in safe_import
    module = importlib.import_module(import_path)
  File "D:\.conda\envs\multi_recall_39\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\document_stores\faiss.py", line 49, in <module>
    class FAISSDocumentStore(SQLDocumentStore):
  File "D:\.conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\document_stores\faiss.py", line 66, in FAISSDocumentStore
    faiss_index: Union[dict, faiss.swigfaiss_avx2.IndexFlat] = None,
AttributeError: module 'faiss' has no attribute 'swigfaiss_avx2'

其他信息

我尝试了安装 python 3.10 、python 3.9、faiss-cpu==1.7.2、faiss-gpu (未果,寻病终)

w5688414 commented 1 year ago

我查看了一下,我的本地环境,是这个

faiss-cpu                      1.7.4
simon7073 commented 1 year ago

抱歉,因同样的错误,所以我把faiss-cpu 换了一个版本,然而未能解决。我不知道还需要哪些信息可以解决这个问题,但我会尽可能提供。谢谢。

w5688414 commented 1 year ago
conda install faiss-cpu -c pytorch
simon7073 commented 1 year ago
conda install faiss-cpu -c pytorch

谢谢,这个命令很有用。解决了 AttributeError: module 'faiss' has no attribute 'swigfaiss_avx2' 这个错误。但我仍无法复现 端到端两路召回语义检索系统

使用 conda 安装 faiss-cpu 时的信息 ``` The following packages will be downloaded: package | build ---------------------------|----------------- faiss-cpu-1.7.4 |py3.9_h2272212_0_cpu 1.4 MB pytorch intel-openmp-2023.2.0 | h57928b3_49496 2.4 MB conda-forge libfaiss-1.7.4 | h2e52968_0_cpu 1.2 MB pytorch libhwloc-2.9.2 |default_haede6df_1009 2.5 MB conda-forge numpy-1.25.2 | py39h816b6a6_0 5.8 MB conda-forge tbb-2021.10.0 | h91493d7_0 152 KB conda-forge ------------------------------------------------------------ Total: 13.4 MB The following NEW packages will be INSTALLED: faiss-cpu pytorch/win-64::faiss-cpu-1.7.4-py3.9_h2272212_0_cpu intel-openmp conda-forge/win-64::intel-openmp-2023.2.0-h57928b3_49496 libblas conda-forge/win-64::libblas-3.9.0-17_win64_mkl libcblas conda-forge/win-64::libcblas-3.9.0-17_win64_mkl libfaiss pytorch/win-64::libfaiss-1.7.4-h2e52968_0_cpu libhwloc conda-forge/win-64::libhwloc-2.9.2-default_haede6df_1009 libiconv conda-forge/win-64::libiconv-1.17-h8ffe710_0 liblapack conda-forge/win-64::liblapack-3.9.0-17_win64_mkl libxml2 conda-forge/win-64::libxml2-2.11.5-hc3477c8_1 mkl conda-forge/win-64::mkl-2022.1.0-h6a75c08_874 numpy conda-forge/win-64::numpy-1.25.2-py39h816b6a6_0 pthreads-win32 conda-forge/win-64::pthreads-win32-2.9.1-hfa6e2cd_3 python_abi conda-forge/win-64::python_abi-3.9-3_cp39 tbb conda-forge/win-64::tbb-2021.10.0-h91493d7_0 ```

等到安装完成后 (可能还需要 conda install numba) 我再一次执行了 python examples/semantic-search/multi_recall_semantic_search_example.py --device gpu --search_engine elastic ,出现了报错信息。

控制台信息 ``` (multi_recall_39) PS D:\WorkSpace\PaddleNLP\pipelines> python examples/semantic-search/multi_recall_semantic_search_example.py --device gpu --search_engine elastic D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\_distutils_hack\__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INFO - pipelines.document_stores.base - Numba not found, replacing njit() with no-op implementation. Enable it with 'pip install numba'. INFO - pipelines.utils.import_utils - Found data stored in `data/dureader_dev`. Delete this first if you really want to fetch new data. INFO - pipelines.utils.preprocessing - Converting data\dureader_dev\dureader_dev\dev0.txt INFO - pipelines.utils.preprocessing - Converting data\dureader_dev\dureader_dev\dev1.txt INFO - pipelines.utils.preprocessing - Converting data\dureader_dev\dureader_dev\dev10.txt .... INFO - pipelines.utils.preprocessing - Converting data\dureader_dev\dureader_dev\dev997.txt INFO - pipelines.utils.preprocessing - Converting data\dureader_dev\dureader_dev\dev998.txt INFO - pipelines.utils.preprocessing - Converting data\dureader_dev\dureader_dev\dev999.txt INFO - elasticsearch - HEAD http://localhost:9200/ [status:200 request:0.039s] INFO - elasticsearch - PUT http://localhost:9200/dureader_nano_query_encoder [status:200 request:2.594s] INFO - elasticsearch - PUT http://localhost:9200/label [status:200 request:2.088s] INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.378s] INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.541s] INFO - elasticsearch - POST http://localhost:9200/_bulk [status:200 request:1.542s] INFO - pipelines.utils.common_utils - Using devices: PLACE(GPU:0) INFO - pipelines.utils.common_utils - Number of GPUs: 1 [2023-08-30 13:30:32,454] [ INFO] - We are using (, False) to load 'rocketqa-zh-nano-query-encoder'. [2023-08-30 13:30:32,454] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_nano_zh_vocab.txt and saved to C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-query-encoder [2023-08-30 13:30:32,800] [ INFO] - Downloading ernie_3.0_nano_zh_vocab.txt from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_nano_zh_vocab.txt 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182k/182k [00:00<00:00, 299kB/s] [2023-08-30 13:30:33,933] [ INFO] - tokenizer config file saved in C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-query-encoder\tokenizer_config.json [2023-08-30 13:30:33,934] [ INFO] - Special tokens file saved in C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-query-encoder\special_tokens_map.json [2023-08-30 13:30:33,937] [ INFO] - Configuration saved in C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-query-encoder\config.json [2023-08-30 13:30:34,415] [ INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/rocketqa/rocketqa-zh-nano-query-encoder.pdparams [2023-08-30 13:30:34,417] [ INFO] - Downloading rocketqa-zh-nano-query-encoder.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/rocketqa/rocketqa-zh-nano-query-encoder.pdparams 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68.3M/68.3M [00:55<00:00, 1.28MB/s] [2023-08-30 13:31:31,839] [ INFO] - Loading weights file model_state.pdparams from cache at C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-query-encoder\model_state.pdparams [2023-08-30 13:31:31,884] [ INFO] - Loaded weights file from disk, setting weights to model. W0830 13:31:33.459180 24700 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 12.2, Runtime API Version: 12.0 W0830 13:31:33.464184 24700 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9. [2023-08-30 13:31:33,581] [ INFO] - All model checkpoint weights were used when initializing ErnieEncoder. [2023-08-30 13:31:33,581] [ WARNING] - Some weights of ErnieEncoder were not initialized from the model checkpoint at rocketqa-zh-nano-query-encoder and are newly initialized: ['emb_reduce_linear.bias', 'emb_reduce_linear.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [2023-08-30 13:31:33,587] [ INFO] - Converting to the inference model cost a little time. I0830 13:31:36.350524 24700 interpretercore.cc:237] New Executor is Running. [2023-08-30 13:31:36,528] [ INFO] - The inference model save in the path:C:\Users\Simon\.paddlenlp\taskflow\rocketqa-zh-nano-query-encoder\static\inference [2023-08-30 13:31:37,562] [ INFO] - We are using (, False) to load 'rocketqa-zh-nano-para-encoder'. [2023-08-30 13:31:37,563] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_nano_zh_vocab.txt and saved to C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-para-encoder [2023-08-30 13:31:37,815] [ INFO] - Downloading ernie_3.0_nano_zh_vocab.txt from https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_nano_zh_vocab.txt 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 182k/182k [00:00<00:00, 706kB/s] [2023-08-30 13:31:38,356] [ INFO] - tokenizer config file saved in C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-para-encoder\tokenizer_config.json [2023-08-30 13:31:38,357] [ INFO] - Special tokens file saved in C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-para-encoder\special_tokens_map.json [2023-08-30 13:31:38,360] [ INFO] - Configuration saved in C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-para-encoder\config.json [2023-08-30 13:31:38,614] [ INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/rocketqa/rocketqa-zh-nano-para-encoder.pdparams [2023-08-30 13:31:38,616] [ INFO] - Downloading rocketqa-zh-nano-para-encoder.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/rocketqa/rocketqa-zh-nano-para-encoder.pdparams 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 68.3M/68.3M [00:42<00:00, 1.69MB/s] [2023-08-30 13:32:21,382] [ INFO] - Loading weights file model_state.pdparams from cache at C:\Users\Simon\.paddlenlp\models\rocketqa-zh-nano-para-encoder\model_state.pdparams [2023-08-30 13:32:21,435] [ INFO] - Loaded weights file from disk, setting weights to model. Traceback (most recent call last): File "D:\WorkSpace\PaddleNLP\pipelines\examples\semantic-search\multi_recall_semantic_search_example.py", line 150, in semantic_search_tutorial() File "D:\WorkSpace\PaddleNLP\pipelines\examples\semantic-search\multi_recall_semantic_search_example.py", line 114, in semantic_search_tutorial dpr_retriever, bm_retriever = get_retrievers(use_gpu) File "D:\WorkSpace\PaddleNLP\pipelines\examples\semantic-search\multi_recall_semantic_search_example.py", line 90, in get_retrievers dpr_retriever = DensePassageRetriever( File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddle_pipelines-0.6.0-py3.9.egg\pipelines\nodes\retriever\dense.py", line 175, in __init__ self.passage_encoder = Taskflow( File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\taskflow\taskflow.py", line 804, in __init__ self.task_instance = task_class( File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\taskflow\text_feature_extraction.py", line 169, in __init__ self._get_inference_model() File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\taskflow\task.py", line 341, in _get_inference_model self._construct_model(self.model) File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\taskflow\text_feature_extraction.py", line 187, in _construct_model self._model = ErnieDualEncoder( File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\transformers\semantic_search\modeling.py", line 96, in __init__ self.query_ernie = ErnieEncoder.from_pretrained(query_model_name_or_path, output_emb_size=output_emb_size) File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1956, in from_pretrained model, missing_keys, unexpected_keys, mismatched_keys = cls._load_pretrained_model( File "D:\Scoop\conda\envs\multi_recall_39\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1722, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for ErnieEncoder: Skip loading for classifier.weight. classifier.weight receives a shape [768, 2], but the expected shape is [312, 2]. You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method. ```

关于最后的报错:

RuntimeError: Error(s) in loading state_dict for ErnieEncoder:
        Skip loading for classifier.weight. classifier.weight receives a shape [768, 2], but the expected shape is [312, 2].
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

我不清楚中间发生了什么,我该如何操作?

PS:我在两台电脑上做了相同的操作,这个问题可以复现。

w5688414 commented 1 year ago
conda install faiss-cpu -c pytorch

谢谢,这个命令很有用。解决了 AttributeError: module 'faiss' has no attribute 'swigfaiss_avx2' 这个错误。但我仍无法复现 端到端两路召回语义检索系统

使用 conda 安装 faiss-cpu 时的信息 等到安装完成后 (可能还需要 conda install numba) 我再一次执行了 python examples/semantic-search/multi_recall_semantic_search_example.py --device gpu --search_engine elastic ,出现了报错信息。

控制台信息 关于最后的报错:

RuntimeError: Error(s) in loading state_dict for ErnieEncoder:
        Skip loading for classifier.weight. classifier.weight receives a shape [768, 2], but the expected shape is [312, 2].
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

我不清楚中间发生了什么,我该如何操作?

PS:我在两台电脑上做了相同的操作,这个问题可以复现。

更新了,看还有没有问题

simon7073 commented 1 year ago

谢谢。另外也很抱歉,是我一昧的按着文档操作,忽略了 multi_recall_semantic_search_example.py 脚本的参数。 接下来我会关闭这个 issue,再次感谢!

tito-dt commented 1 year ago
conda install faiss-cpu -c pytorch

您好,和上面一样的报错,请教下,没有安装anaconda的话,这个怎么处理? faiss-cpu 1.7.4 torch 2.0.1

imempty commented 11 months ago
conda install faiss-cpu -c pytorch

谢谢,这个命令很有用。解决了 AttributeError: module 'faiss' has no attribute 'swigfaiss_avx2' 这个错误。但我仍无法复现 端到端两路召回语义检索系统。 使用 conda 安装 faiss-cpu 时的信息 等到安装完成后 (可能还需要 conda install numba) 我再一次执行了 python examples/semantic-search/multi_recall_semantic_search_example.py --device gpu --search_engine elastic ,出现了报错信息。 控制台信息 关于最后的报错:

RuntimeError: Error(s) in loading state_dict for ErnieEncoder:
        Skip loading for classifier.weight. classifier.weight receives a shape [768, 2], but the expected shape is [312, 2].
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

我不清楚中间发生了什么,我该如何操作? PS:我在两台电脑上做了相同的操作,这个问题可以复现。

更新了,看还有没有问题

你好,我遇到了同样的问题,报错信息如下: RuntimeError: Error(s) in loading state_dict for ErnieForSequenceClassification: Skip loading for embeddings.word_embeddings.weight. embeddings.word_embeddings.weight receives a shape [30522, 768], but the expected shape is [40000, 768].
Skip loading for embeddings.position_embeddings.weight. embeddings.position_embeddings.weight receives a shape [512, 768], but the expected shape is [2048, 768].
You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method. 你指的更新是更新paddleNLP么,我已经在用最新的2.6.1了,paddlepaddle版本是2.5.2。 加载预训练模型的语句是,tokenizer = AutoTokenizer.from_pretrained("./ernie-3.0-base-zh/")。因为我并没有联网,所以下载了离线的ernie模型文件,放到了train.py同级目录下。 启动命令是python train.py --dataset_dir "./data" --device "cpu" --max_seq_length 128 --model_name "ernie-3.0-base-zh" --batch_size 32 --early_stop --epochs 5。发现model_name参数无法使用文件路径形式,只能从那几个模型名称里选一个。所以我只好去改了代码,尝试 tokenizer = AutoTokenizer.from_pretrained("./ernie-3.0-base-zh/", {"ignore_mismatched_sizes": True}) 也是不行的,此时提示repo name有误

imempty commented 11 months ago
conda install faiss-cpu -c pytorch

谢谢,这个命令很有用。解决了 AttributeError: module 'faiss' has no attribute 'swigfaiss_avx2' 这个错误。但我仍无法复现 端到端两路召回语义检索系统。 使用 conda 安装 faiss-cpu 时的信息 等到安装完成后 (可能还需要 conda install numba) 我再一次执行了 python examples/semantic-search/multi_recall_semantic_search_example.py --device gpu --search_engine elastic ,出现了报错信息。 控制台信息 关于最后的报错:

RuntimeError: Error(s) in loading state_dict for ErnieEncoder:
        Skip loading for classifier.weight. classifier.weight receives a shape [768, 2], but the expected shape is [312, 2].
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

我不清楚中间发生了什么,我该如何操作? PS:我在两台电脑上做了相同的操作,这个问题可以复现。

更新了,看还有没有问题

@w5688414