Open man-qa opened 4 months ago
Unexpected error during reindex of docs with PPTX files inside
coworker-ai | INFO: 24-07-2024 21:05:05 [storageLogger.chat_manager] Re-indexing the documents coworker-ai | INFO: 24-07-2024 21:05:05 [storageLogger.vector_db] Re-indexing documents coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Removing documents chromadb | INFO: [24-07-2024 21:05:05] 172.21.0.4:35130 - "DELETE /api/v1/collections/docs_and_tickets_collection?tenant=default_tenant&database=default_database HTTP/1.1" 200 coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Documents removed successfully coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Getting or creating the collection: docs_and_tickets_collection chromadb | INFO: [24-07-2024 21:05:05] 172.21.0.4:35130 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200 coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Loading documents coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Getting the vector store coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Getting or creating the collection: docs_and_tickets_collection chromadb | DEBUG: [24-07-2024 21:05:05] Collection docs_and_tickets_collection already exists, returning existing collection. chromadb | INFO: [24-07-2024 21:05:05] 172.21.0.4:35130 - "POST /api/v1/collections?tenant=default_tenant&database=default_database HTTP/1.1" 200 coworker-ai | INFO: 24-07-2024 21:05:05 [storageLogger.data_ingestion] DataIngestionManager initialization coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.data_ingestion] Setting up the cache for ingestion pipeline coworker-ai | INFO: 24-07-2024 21:05:05 [storageLogger.data_ingestion] DataIngestionManager initialized coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.vector_db] Loading tickets documents coworker-ai | INFO: 24-07-2024 21:05:05 [storageLogger.data_ingestion] Running ingestion pipeline for tickets coworker-ai | DEBUG: 24-07-2024 21:05:05 [storageLogger.data_ingestion] Loading documents from data/tickets coworker-ai | Traceback (most recent call last): coworker-ai | File "/usr/local/lib/python3.12/site-packages/llama_index/readers/file/slides/base.py", line 27, in __init__ coworker-ai | import torch # noqa coworker-ai | ^^^^^^^^^^^^ coworker-ai | ModuleNotFoundError: No module named 'torch' coworker-ai | coworker-ai | During handling of the above exception, another exception occurred: coworker-ai | coworker-ai | Traceback (most recent call last): coworker-ai | File "/usr/local/lib/python3.12/site-packages/gradio/queueing.py", line 541, in process_events coworker-ai | response = await route_utils.call_process_api( coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/gradio/route_utils.py", line 276, in call_process_api coworker-ai | output = await app.get_blocks().process_api( coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1928, in process_api coworker-ai | result = await self.call_function( coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/gradio/blocks.py", line 1514, in call_function coworker-ai | prediction = await anyio.to_thread.run_sync( coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync coworker-ai | return await get_async_backend().run_sync_in_worker_thread( coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread coworker-ai | return await future coworker-ai | ^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 859, in run coworker-ai | result = context.run(func, *args) coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/gradio/utils.py", line 833, in wrapper coworker-ai | response = f(*args, **kwargs) coworker-ai | ^^^^^^^^^^^^^^^^^^ coworker-ai | File "/app/src/role/chat_manager.py", line 91, in reindex_documents coworker-ai | self._store.reindex_documents() coworker-ai | File "/app/src/storage/vector_db.py", line 73, in reindex_documents coworker-ai | self._load_documents() coworker-ai | File "/app/src/storage/vector_db.py", line 94, in _load_documents coworker-ai | data_ingest_manager.run_ingestion_pipeline(document_type=document_type) coworker-ai | File "/app/src/storage/data_ingestion.py", line 46, in run_ingestion_pipeline coworker-ai | documents = self._load_documents(dir_path=DOCUMENT_PATHS[document_type]) coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/app/src/storage/data_ingestion.py", line 88, in _load_documents coworker-ai | return SimpleDirectoryReader(dir_path).load_data() coworker-ai | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/llama_index/core/readers/file/base.py", line 688, in load_data coworker-ai | SimpleDirectoryReader.load_file( coworker-ai | File "/usr/local/lib/python3.12/site-packages/llama_index/core/readers/file/base.py", line 530, in load_file coworker-ai | file_extractor[file_suffix] = reader_cls() coworker-ai | ^^^^^^^^^^^^ coworker-ai | File "/usr/local/lib/python3.12/site-packages/llama_index/readers/file/slides/base.py", line 36, in __init__ coworker-ai | raise ImportError( coworker-ai | ImportError: Please install extra dependencies that are required for the PptxReader: `pip install torch transformers python-pptx Pillow`
Thank You for the interest in the project! I will add it to my improvement list!
Unexpected error during reindex of docs with PPTX files inside