SciPhi-AI / R2R

The most advanced Retrieval-Augmented Generation (RAG) system, containerized and RESTful
https://r2r-docs.sciphi.ai/
MIT License
3.64k stars 270 forks source link

Graph Creation Request Fails When Specifying List of document_ids #1116

Open br00t4c opened 2 months ago

br00t4c commented 2 months ago

Describe the bug

Call to create_graph() with list of document_ids via Python SDK yields SQLAlchemy error

To Reproduce

Steps to reproduce the behavior:

  1. Ingest documents
  2. Enumerate document ids by calling get_all_documents()
  3. Attempt to initiate graph generation by calling create_graph() with list of document_ids from step 3
  4. See error:
[
  "This step failed with error (psycopg2.errors.UndefinedFunction) operator does not exist: uuid = text\nLINE 4:          WHERE document_id = ANY(ARRAY['e28e8cd3-03e8-5b07-8...\n                                   ^\nHINT:  No operator matches the given name and argument types. You might need to add explicit type casts.\n\n[SQL: \n            SELECT document_id, group_ids, user_id, type, metadata, title, version, size_in_bytes, ingestion_status, created_at, updated_at, restructuring_status\n            FROM document_info_local_llm_neo4j_kg\n         WHERE document_id = ANY(%(document_ids)s)\n            ORDER BY created_at DESC\n            OFFSET %(offset)s\n            LIMIT 100\n        ]\n[parameters: {'document_ids': ['e28e8cd3-03e8-5b07-85c3-2beef509fbb0', 'b54b0b42-e0cd-5f18-b66c-9c2d6ab197ab', 'd8e36b31-7886-5b6d-8f56-8efec19f5bf9', '157b2b58-0fdc-58db-93aa-84ed ... (3700 characters truncated) ... -e036-5851-8ebc-8c92ef765a25', '041c2250-1da4-51e3-932e-8d0364daccf1', 'ea553d32-d2ff-5604-9437-ae49b8a0e0a1', '1c7f0a3e-4f40-5339-801a-6a53c408ec44'], 'offset': 0}]\n(Background on this error at: https://sqlalche.me/e/20/f405)\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1967, in _exec_single_context\n    self.dialect.do_execute(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py\", line 941, in do_execute\n    cursor.execute(statement, parameters)\npsycopg2.errors.UndefinedFunction: operator does not exist: uuid = text\nLINE 4:          WHERE document_id = ANY(ARRAY['e28e8cd3-03e8-5b07-8...\n                                   ^\nHINT:  No operator matches the given name and argument types. You might need to add explicit type casts.\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/site-packages/hatchet_sdk/worker/runner/runner.py\", line 191, in inner_callback\n    output = task.result()\n  File \"/usr/local/lib/python3.10/site-packages/hatchet_sdk/worker/runner/runner.py\", line 309, in async_wrapped_action_func\n    raise e\n  File \"/usr/local/lib/python3.10/site-packages/hatchet_sdk/worker/runner/runner.py\", line 285, in async_wrapped_action_func\n    return await action_func(context)\n  File \"/app/core/main/hatchet/restructure_workflow.py\", line 83, in kg_extraction_ingress\n    documents_overviews = self.restructure_service.providers.database.relational.get_documents_overview(\n  File \"/app/core/providers/database/document.py\", line 117, in get_documents_overview\n    results = self.execute_query(query, params).fetchall()\n  File \"/app/core/providers/database/relational.py\", line 31, in execute_query\n    return execute_query(self.vx, query, params)\n  File \"/app/core/providers/database/base.py\", line 16, in execute_query\n    result = sess.execute(query, params or {})\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py\", line 2362, in execute\n    return self._execute_internal(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py\", line 2256, in _execute_internal\n    result = conn.execute(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1418, in execute\n    return meth(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py\", line 515, in _execute_on_connection\n    return connection._execute_clauseelement(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1640, in _execute_clauseelement\n    ret = self._execute_context(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1846, in _execute_context\n    return self._exec_single_context(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1986, in _exec_single_context\n    self._handle_dbapi_exception(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 2355, in _handle_dbapi_exception\n    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py\", line 1967, in _exec_single_context\n    self.dialect.do_execute(\n  File \"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py\", line 941, in do_execute\n    cursor.execute(statement, parameters)\nsqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: uuid = text\nLINE 4:          WHERE document_id = ANY(ARRAY['e28e8cd3-03e8-5b07-8...\n                                   ^\nHINT:  No operator matches the given name and argument types. You might need to add explicit type casts.\n\n[SQL: \n            SELECT document_id, group_ids, user_id, type, metadata, title, version, size_in_bytes, ingestion_status, created_at, updated_at, restructuring_status\n            FROM document_info_local_llm_neo4j_kg\n         WHERE document_id = ANY(%(document_ids)s)\n            ORDER BY created_at DESC\n            OFFSET %(offset)s\n            LIMIT 100\n        ]\n[parameters: {'document_ids': ['e28e8cd3-03e8-5b07-85c3-2beef509fbb0', 'b54b0b42-e0cd-5f18-b66c-9c2d6ab197ab', 'd8e36b31-7886-5b6d-8f56-8efec19f5bf9', '157b2b58-0fdc-58db-93aa-84ed ... (3700 characters truncated) ... -e036-5851-8ebc-8c92ef765a25', '041c2250-1da4-51e3-932e-8d0364daccf1', 'ea553d32-d2ff-5604-9437-ae49b8a0e0a1', '1c7f0a3e-4f40-5339-801a-6a53c408ec44'], 'offset': 0}]\n(Background on this error at: https://sqlalche.me/e/20/f405)\n"
]

Expected behavior

Successful initiation of extraction for graph generation

Screenshots

Screenshot from 2024-09-11 09-43-40

Desktop (please complete the following information):

Additional context

r2r version
3.1.20
shreyaspimpalgaonkar commented 2 months ago

Looks like there's an issue with the sql query. I'll push a fix shortly. Meanwhile, could you try running create-graph with no args? It should run on all ingested documents.

br00t4c commented 2 months ago

Yeah everything work fine from CLI or when calling create_graph() without document_id params specified ;) Thanks for the quick reply, I'll wait for the next bug fix release. BUT if I'm not mistaken... if you launch a graph creation request available documents are enumerated by a call to get_all_documents() which defaults to a max of 100 documents, so if you have a document collection with > 100 documents, you will have to call create_graph() multiple times to insure all documents are extracted