fynnfluegge / codeqai

Local first semantic code search and chat powered by vector embeddings and LLMs
Apache License 2.0
383 stars 46 forks source link

Assertion error in faiss #36

Open dinarior opened 6 months ago

dinarior commented 6 months ago

Very cool project, trying to get it to work I face some issue -

Using local embeddings (INSTRUCTOR_Transformer) and llamacpp model, any search/chat ends up in the following assertion:

load INSTRUCTOR_Transformer
max_seq_length  512
🔎 Enter a search pattern: preprocessing
⠹ 🤖 Processing...Traceback (most recent call last):
  File "/Users/dinari/.local/bin/codeqai", line 10, in <module>
    sys.exit(main())
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/codeqai/__main__.py", line 5, in main
    app.run()
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/codeqai/app.py", line 177, in run
    similarity_result = vector_store.similarity_search(search_pattern)
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/codeqai/vector_store.py", line 131, in similarity_search
    return self.db.similarity_search(query, k=4)
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 544, in similarity_search
    docs_and_scores = self.similarity_search_with_score(
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 417, in similarity_search_with_score
    docs = self.similarity_search_with_score_by_vector(
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 302, in similarity_search_with_score_by_vector
    scores, indices = self.index.search(vector, k if filter is None else fetch_k)
  File "/Users/dinari/Library/Application Support/pipx/venvs/codeqai/lib/python3.10/site-packages/faiss/class_wrappers.py", line 329, in replacement_search
    assert d == self.d
AssertionError

Using apple silicon arm64 arch.

ghost commented 5 months ago

same with an ubuntu ec2 instance g54xl

💬 Ask anything about the codebase: what is this codebase used for?
⠹ 🤖 Processing...Traceback (most recent call last):
  File "/code/codeqai/bin/codeqai", line 8, in <module>
    sys.exit(main())
  File "/code/codeqai/lib/python3.10/site-packages/codeqai/__main__.py", line 5, in main
    app.run()
  File "/code/codeqai/lib/python3.10/site-packages/codeqai/app.py", line 208, in run
    result = qa(question)
  File "/code/codeqai/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/code/codeqai/lib/python3.10/site-packages/langchain/chains/base.py", line 363, in __call__
    return self.invoke(
  File "/code/codeqai/lib/python3.10/site-packages/langchain/chains/base.py", line 162, in invoke
    raise e
  File "/code/codeqai/lib/python3.10/site-packages/langchain/chains/base.py", line 156, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/code/codeqai/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 155, in _call
    docs = self._get_docs(new_question, inputs, run_manager=_run_manager)
  File "/code/codeqai/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 317, in _get_docs
    docs = self.retriever.get_relevant_documents(
  File "/code/codeqai/lib/python3.10/site-packages/langchain_core/retrievers.py", line 224, in get_relevant_documents
    raise e
  File "/code/codeqai/lib/python3.10/site-packages/langchain_core/retrievers.py", line 217, in get_relevant_documents
    result = self._get_relevant_documents(
  File "/code/codeqai/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 663, in _get_relevant_documents
    docs = self.vectorstore.max_marginal_relevance_search(
  File "/code/codeqai/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 789, in max_marginal_relevance_search
    docs = self.max_marginal_relevance_search_by_vector(
  File "/code/codeqai/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 724, in max_marginal_relevance_search_by_vector
    docs_and_scores = self.max_marginal_relevance_search_with_score_by_vector(
  File "/code/codeqai/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 602, in max_marginal_relevance_search_with_score_by_vector
    scores, indices = self.index.search(
  File "/code/codeqai/lib/python3.10/site-packages/faiss/__init__.py", line 308, in replacement_search
    assert d == self.d
AssertionError
fynnfluegge commented 5 months ago

Thanks for reporting, there is this issue with sentence-transformers https://github.com/PromtEngineer/localGPT/issues/722

Does this issue happen only when using INSTRUCTOR_Transformer together with llama.cpp?

ghost commented 5 months ago

That was the config I was using, yes. (I believe I had other fatal errors trying to use either of the other sentence transformer options, I will go back and try to recreate.)

I looked at that thread you posted and can try running sentence-transformers locked at 2.2.2

dinarior commented 5 months ago

I was using 2.2.2 (injected to the pipx installation), this is the pip list -

aiohttp                   3.9.3
aiosignal                 1.3.1
altair                    5.2.0
annotated-types           0.6.0
anyio                     4.3.0
async-timeout             4.0.3
attrs                     23.2.0
blessed                   1.20.0
blinker                   1.7.0
cachetools                5.3.3
certifi                   2024.2.2
charset-normalizer        3.3.2
click                     8.1.7
codeqai                   0.0.14
dataclasses-json          0.6.4
diskcache                 5.6.3
distro                    1.9.0
editor                    1.6.6
exceptiongroup            1.2.0
faiss-cpu                 1.7.4
filelock                  3.13.1
frozenlist                1.4.1
fsspec                    2024.2.0
gitdb                     4.0.11
GitPython                 3.1.42
h11                       0.14.0
httpcore                  1.0.4
httpx                     0.27.0
huggingface-hub           0.21.3
idna                      3.6
importlib-metadata        7.0.1
inquirer                  3.2.4
InstructorEmbedding       1.0.1
Jinja2                    3.1.3
joblib                    1.3.2
jsonpatch                 1.33
jsonpointer               2.4
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
langchain                 0.1.5
langchain-community       0.0.17
langchain-core            0.1.23
langchain-openai          0.0.5
langsmith                 0.0.87
llama_cpp_python          0.2.53
markdown-it-py            3.0.0
MarkupSafe                2.1.5
marshmallow               3.21.0
mdurl                     0.1.2
mpmath                    1.3.0
multidict                 6.0.5
mypy-extensions           1.0.0
networkx                  3.2.1
nltk                      3.8.1
numpy                     1.26.4
openai                    1.13.3
packaging                 23.2
pandas                    2.2.1
pillow                    10.2.0
pip                       24.0
protobuf                  4.25.3
pyarrow                   15.0.0
pydantic                  2.6.3
pydantic_core             2.16.3
pydeck                    0.8.1b0
Pygments                  2.17.2
python-dateutil           2.8.2
python-dotenv             1.0.1
pytz                      2024.1
PyYAML                    6.0.1
readchar                  4.0.5
referencing               0.33.0
regex                     2023.12.25
requests                  2.31.0
rich                      13.7.1
rpds-py                   0.18.0
runs                      1.2.2
safetensors               0.4.2
scikit-learn              1.4.1.post1
scipy                     1.12.0
sentence-transformers     2.2.2
sentencepiece             0.2.0
setuptools                65.5.0
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.1
SQLAlchemy                2.0.27
streamlit                 1.31.1
sympy                     1.12
tenacity                  8.2.3
termcolor                 2.4.0
threadpoolctl             3.3.0
tiktoken                  0.5.2
tokenizers                0.15.2
toml                      0.10.2
toolz                     0.12.1
torch                     2.2.1
torchvision               0.17.1
tornado                   6.4
tqdm                      4.66.2
transformers              4.38.1
tree-sitter               0.20.4
tree-sitter-languages     1.10.2
typing_extensions         4.10.0
typing-inspect            0.9.0
tzdata                    2024.1
tzlocal                   5.2
urllib3                   2.2.1
validators                0.22.0
wcwidth                   0.2.13
xmod                      1.8.1
yarl                      1.9.4
yaspin                    3.0.1
zipp                      3.17.0

Switching to other tokenizer this error did no occur.

umbrellateng commented 4 months ago

I also encountered this problem, when I use the langchain framework to create a AutoGPT agent, and the model used is glm-4

> Entering new LLMChain chain... /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/mpnet/modeling_mpnet.py:1054: UserWarning: cumsum_out_mps supported by MPS on MacOS 13+, please upgrade (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/UnaryOps.mm:425.) incremental_indices = torch.cumsum(mask, dim=1).type_as(mask) * mask Traceback (most recent call last): File "/Users/apple/dev/src/github.com/umbrellateng/AGILearn/early/auto_gpt.py", line 63, in <module> auto_gpt_learn() File "/Users/apple/dev/src/github.com/umbrellateng/AGILearn/early/auto_gpt.py", line 59, in auto_gpt_learn agent.run(["Write a weather report for Beijing today"]) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_experimental/autonomous_agents/autogpt/agent.py", line 93, in run assistant_reply = self.chain.run( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/base.py", line 550, in run return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[ File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__ return self.invoke( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke raise e File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call response = self.generate([inputs], run_manager=run_manager) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/llm.py", line 112, in generate prompts, stop = self.prep_prompts(input_list, run_manager=run_manager) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/chains/llm.py", line 174, in prep_prompts prompt = self.prompt.format_prompt(**selected_inputs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_core/prompts/chat.py", line 535, in format_prompt messages = self.format_messages(**kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_experimental/autonomous_agents/autogpt/prompt.py", line 76, in format_messages relevant_docs = memory.get_relevant_documents(str(previous_messages[-10:])) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_core/retrievers.py", line 245, in get_relevant_documents raise e File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_core/retrievers.py", line 238, in get_relevant_documents result = self._get_relevant_documents( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 674, in _get_relevant_documents docs = self.vectorstore.similarity_search(query, **self.search_kwargs) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 530, in similarity_search docs_and_scores = self.similarity_search_with_score( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 403, in similarity_search_with_score docs = self.similarity_search_with_score_by_vector( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 304, in similarity_search_with_score_by_vector scores, indices = self.index.search(vector, k if filter is None else fetch_k) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/faiss/class_wrappers.py", line 329, in replacement_search assert d == self.d