Stihotvor / coworker_ai

1 stars 0 forks source link

Unexpected error during reindex of docs with PPTX files inside #12

Open man-qa opened 4 months ago

man-qa commented 4 months ago

Unexpected error during reindex of docs with PPTX files inside

image
coworker-ai  | INFO:   24-07-2024 21:05:05 [storageLogger.chat_manager] Re-indexing the documents
coworker-ai  | INFO:   24-07-2024 21:05:05 [storageLogger.vector_db] Re-indexing documents
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Removing documents
chromadb     | INFO:     [24-07-2024 21:05:05] 172.21.0.4:35130 - "DELETE /api/v1/collections/docs_and_tickets_collection?tenant=default_tenant&database=default_database HTTP/1.1" 200
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Documents removed successfully
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Getting or creating the collection: docs_and_tickets_collection
chromadb     | INFO:     [24-07-2024 21:05:05] 172.21.0.4:35130 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Loading documents
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Getting the vector store
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Getting or creating the collection: docs_and_tickets_collection
chromadb     | DEBUG:    [24-07-2024 21:05:05] Collection docs_and_tickets_collection already exists, returning existing collection.
chromadb     | INFO:     [24-07-2024 21:05:05] 172.21.0.4:35130 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200
coworker-ai  | INFO:   24-07-2024 21:05:05 [storageLogger.data_ingestion] DataIngestionManager initialization
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.data_ingestion] Setting up the cache for ingestion pipeline
coworker-ai  | INFO:   24-07-2024 21:05:05 [storageLogger.data_ingestion] DataIngestionManager initialized
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.vector_db] Loading tickets documents
coworker-ai  | INFO:   24-07-2024 21:05:05 [storageLogger.data_ingestion] Running ingestion pipeline for tickets
coworker-ai  | DEBUG:   24-07-2024 21:05:05 [storageLogger.data_ingestion] Loading documents from data/tickets
coworker-ai  | Traceback (most recent call last):
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/llama_index/readers/file/slides/base.py", line 27, in __init__
coworker-ai  |     import torch  # noqa
coworker-ai  |     ^^^^^^^^^^^^
coworker-ai  | ModuleNotFoundError: No module named 'torch'
coworker-ai  | 
coworker-ai  | During handling of the above exception, another exception occurred:
coworker-ai  | 
coworker-ai  | Traceback (most recent call last):
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/gradio/queueing.py", line 541, in process_events
coworker-ai  |     response = await route_utils.call_process_api(
coworker-ai  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 276, in call_process_api
coworker-ai  |     output = await app.get_blocks().process_api(
coworker-ai  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1928, in process_api
coworker-ai  |     result = await self.call_function(
coworker-ai  |              ^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1514, in call_function
coworker-ai  |     prediction = await anyio.to_thread.run_sync(
coworker-ai  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
coworker-ai  |     return await get_async_backend().run_sync_in_worker_thread(
coworker-ai  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
coworker-ai  |     return await future
coworker-ai  |            ^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 859, in run
coworker-ai  |     result = context.run(func, *args)
coworker-ai  |              ^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/gradio/utils.py", line 833, in wrapper
coworker-ai  |     response = f(*args, **kwargs)
coworker-ai  |                ^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/app/src/role/chat_manager.py", line 91, in reindex_documents
coworker-ai  |     self._store.reindex_documents()
coworker-ai  |   File "/app/src/storage/vector_db.py", line 73, in reindex_documents
coworker-ai  |     self._load_documents()
coworker-ai  |   File "/app/src/storage/vector_db.py", line 94, in _load_documents
coworker-ai  |     data_ingest_manager.run_ingestion_pipeline(document_type=document_type)
coworker-ai  |   File "/app/src/storage/data_ingestion.py", line 46, in run_ingestion_pipeline
coworker-ai  |     documents = self._load_documents(dir_path=DOCUMENT_PATHS[document_type])
coworker-ai  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/app/src/storage/data_ingestion.py", line 88, in _load_documents
coworker-ai  |     return SimpleDirectoryReader(dir_path).load_data()
coworker-ai  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/llama_index/core/readers/file/base.py", line 688, in load_data
coworker-ai  |     SimpleDirectoryReader.load_file(
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/llama_index/core/readers/file/base.py", line 530, in load_file
coworker-ai  |     file_extractor[file_suffix] = reader_cls()
coworker-ai  |                                   ^^^^^^^^^^^^
coworker-ai  |   File "/usr/local/lib/python3.12/site-packages/llama_index/readers/file/slides/base.py", line 36, in __init__
coworker-ai  |     raise ImportError(
coworker-ai  | ImportError: Please install extra dependencies that are required for the PptxReader: `pip install torch transformers python-pptx Pillow`
Stihotvor commented 4 months ago

Thank You for the interest in the project! I will add it to my improvement list!