I've been following the steps in readme and the video tutorial. However, I'm unable to pass through successful ingestion of a docx file. It works fine with .pdf. Anything I need to look into?
This is what I get when I type in python3 ingest.py
`Creating new vectorstore
Loading documents from source_documents
Loading new documents: 0%| | 0/2 [00:02<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 84, in load_single_document
return loader.load()
^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 86, in load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/document_loaders/word_document.py", line 122, in _get_elements
from unstructured.partition.docx import partition_docx
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/unstructured/partition/docx.py", line 6, in
import docx
ModuleNotFoundError: No module named 'docx'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 161, in
main()
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 151, in main
texts = process_documents()
^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 113, in process_documents
documents = load_documents(source_directory, ignored_files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 102, in load_documents
for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
ModuleNotFoundError: No module named 'docx'`
I've been following the steps in readme and the video tutorial. However, I'm unable to pass through successful ingestion of a docx file. It works fine with .pdf. Anything I need to look into? This is what I get when I type in python3 ingest.py
`Creating new vectorstore Loading documents from source_documents Loading new documents: 0%| | 0/2 [00:02<?, ?it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 84, in load_single_document return loader.load() ^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 86, in load elements = self._get_elements() ^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/document_loaders/word_document.py", line 122, in _get_elements from unstructured.partition.docx import partition_docx File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/unstructured/partition/docx.py", line 6, in
import docx
ModuleNotFoundError: No module named 'docx'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 161, in
main()
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 151, in main
texts = process_documents()
^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 113, in process_documents
documents = load_documents(source_directory, ignored_files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 102, in load_documents
for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
ModuleNotFoundError: No module named 'docx'`