crate / cratedb-examples

A collection of clear and concise examples how to work with CrateDB.
Apache License 2.0
9 stars 7 forks source link

[LangChain] FAILED test.py::test_file[vector_search.py] - ValueError: Collection not found #580

Open amotl opened 2 weeks ago

amotl commented 2 weeks ago

Problem

Testing the integration with LangChain shows intermittent errors on scheduled runs, starting three weeks ago, going green in between for three runs, and going red again afterwards.

See: https://github.com/crate/cratedb-examples/actions/workflows/ml-langchain.yml

References

Details

This is probably the root cause?

ERROR    langchain_community.document_loaders.url:url.py:145 Error fetching or processing https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt, exception: 

Traceback

------------------------------ Captured log call -------------------------------
ERROR    langchain_community.document_loaders.url:url.py:145 Error fetching or processing https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt, exception: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')

  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/home/runner/nltk_data'
    - '/opt/hostedtoolcache/Python/3.10.14/x64/nltk_data'
    - '/opt/hostedtoolcache/Python/3.10.14/x64/share/nltk_data'
    - '/opt/hostedtoolcache/Python/3.10.14/x64/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
=============================== warnings summary ===============================
test.py::test_notebook[conversational_memory.ipynb]
  /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/nbformat/__init__.py:96: MissingIDFieldWarning: Cell is missing an id field, this will become a hard error in future nbformat versions. You may want to use `normalize()` on your notebooks before validations (available since nbformat 5.1.4). Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future.
    validate(nb)

test.py::test_file[conversational_memory.py]
  /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/langchain_community/chat_message_histories/sql.py:143: LangChainDeprecationWarning: `connection_string` was deprecated in LangChain 0.2.2 and will be removed in 0.3.0. Use Use connection instead instead.
    warn_deprecated(

test.py::test_file[vector_search.py]
  /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/langchain_community/vectorstores/pgvector.py:322: LangChainPendingDeprecationWarning: Please use JSONB instead of JSON for metadata. This change will allow for more efficient querying that involves filtering based on metadata.Please note that filtering operators have been changed when using JSOB metadata to be prefixed with a $ sign to avoid name collisions with columns. If you're using an existing database, you will need to create adb migration for your metadata column to be JSONB and update your queries to use the new operators. 
    warn_deprecated(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test.py::test_file[vector_search.py] - ValueError: Collection not found
amotl commented 2 weeks ago

Thoughts I

This is probably the root cause?

ERROR    langchain_community.document_loaders.url:url.py:145 Error fetching or processing https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt, exception: 

Maybe the reason is just because the code can't fetch https://github.com/langchain-ai/langchain/raw/v0.0.325/docs/docs/modules/state_of_the_union.txt? It works when probing the URL using my browser, but it might be different on CI/GHA?

Thoughts II

On the other hand, there is also this message, tripping from nltk.download('punkt_tab'):

Resource punkt_tab not found.