PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
19.53k stars 2.19k forks source link

0 chunks indexError: list index is out of range #602

Open leviathofnoesia opened 8 months ago

leviathofnoesia commented 8 months ago

2023-10-21 08:18:04,561 - INFO - ingest.py:153 - Loaded 1429 documents from C:\Users\billy\localGPT/SOURCE_DOCUMENTS 2023-10-21 08:18:04,561 - INFO - ingest.py:154 - Split into 0 chunks of text 2023-10-21 08:18:06,628 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 Traceback (most recent call last): File "C:\Users\billy\localGPT\ingest.py", line 181, in main() File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 1157, in call return self.main(args, kwargs) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 783, in invoke return __callback(args, kwargs) File "C:\Users\billy\localGPT\ingest.py", line 168, in main db = Chroma.from_documents( File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 613, in from_documents return cls.from_texts( File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 577, in from_texts chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 187, in add_texts embeddings = self._embedding_function.embed_documents(texts) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\embeddings\huggingface.py", line 169, in embed_documents embeddings = self.client.encode(instruction_pairs, self.encode_kwargs) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\InstructorEmbedding\instructor.py", line 524, in encode if isinstance(sentences[0],list): IndexError: list index out of range

a-l-e-it commented 8 months ago

Hi, i noticed in "file_ingest.log" this error: "libGL.so.1: cannot open shared object file: No such file or directory". So, to solve, I installed some packages: "apt-get install ffmpeg libsm6 libxext6 -y"

ghost commented 8 months ago

Hi, i noticed in "file_ingest.log" this error: "libGL.so.1: cannot open shared object file: No such file or directory". So, to solve, I installed some packages: "apt-get install ffmpeg libsm6 libxext6 -y"

I have the same issue. how do I run that command?

vscode terminal says "zsh: command not found: apt-get"

cyber-nic commented 8 months ago

encountered this (or similar?) issue following docker instructions:

$ docker build . -t localgpt
[+] Building 8758.4s (14/15)                                                                                                                                                                        docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.5s
 => => transferring dockerfile: 1.34kB                                                                                                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.4s
 => => transferring context: 84B                                                                                                                                                                              0.0s
 => resolve image config for docker.io/docker/dockerfile:1                                                                                                                                                    1.8s
 => docker-image://docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021                                                                                     20.0s
 => => resolve docker.io/docker/dockerfile:1@sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021                                                                                          0.1s
 => => sha256:a17ee7fff8f5e97b974f5b48f51647d2cf28d543f2aa6c11aaa0ea431b44bb89 1.27kB / 1.27kB                                                                                                                0.0s
 => => sha256:9d9c93f4b00be908ab694a4df732570bced3b8a96b7515d70ff93402179ad232 11.80MB / 11.80MB                                                                                                             19.4s
 => => sha256:ac85f380a63b13dfcefa89046420e1781752bab202122f8f50032edf31be0021 8.40kB / 8.40kB                                                                                                                0.0s
 => => sha256:657fcc512c7369f4cb3d94ea329150f8daf626bc838b1a1e81f1834c73ecc77e 482B / 482B                                                                                                                    0.0s
 => => extracting sha256:9d9c93f4b00be908ab694a4df732570bced3b8a96b7515d70ff93402179ad232                                                                                                                     0.1s
 => [internal] load metadata for docker.io/nvidia/cuda:11.7.1-runtime-ubuntu22.04                                                                                                                             2.3s
 => [internal] load build context                                                                                                                                                                             0.2s
 => => transferring context: 1.51MB                                                                                                                                                                           0.0s
 => [stage-0 1/9] FROM docker.io/nvidia/cuda:11.7.1-runtime-ubuntu22.04@sha256:88e583c93103cc5ef229fbbd9ba4ad6e1d3dd36f186bdc6dc53014bebeacf558                                                            7930.1s
 => => resolve docker.io/nvidia/cuda:11.7.1-runtime-ubuntu22.04@sha256:88e583c93103cc5ef229fbbd9ba4ad6e1d3dd36f186bdc6dc53014bebeacf558                                                                       0.2s
 => => sha256:6a32dbf4c971ee8539ab5552d2e3d140a75e89262118a4830917211f766ba89c 2.21kB / 2.21kB                                                                                                                0.0s
 => => sha256:d916bd2c64a8cc6acf3b4206f756052139110519c724ed4195e3696089e460a3 13.07kB / 13.07kB                                                                                                              0.0s
 => => sha256:06539ded452e3219959d498fd6974839da20eb3aa42e0f74982c7e5e238ae863 4.62MB / 4.62MB                                                                                                                3.7s
 => => sha256:88e583c93103cc5ef229fbbd9ba4ad6e1d3dd36f186bdc6dc53014bebeacf558 743B / 743B                                                                                                                    0.0s
 => => sha256:23ddea9c53775c39f9bfab9a019d4b76ef99631f4374ac9790855650673723d7 47.88MB / 47.88MB                                                                                                            162.7s
 => => sha256:ceb5944230f9f8a8e22bc1323cf8bb798aa1b25d2f8a8a4e2a99ce1c5be8d934 184B / 184B                                                                                                                    0.6s
 => => sha256:e9783fd39993a8d51ba3217b92503a56fbceaa2c29e0da28981c8c41313a32f6 6.88kB / 6.88kB                                                                                                                0.8s
 => => sha256:91d0d349e26fd3284603e9b8c6d7721deebd0df3a67c9e18f67801bbf975a82f 1.09GB / 1.09GB                                                                                                             7918.0s
 => => extracting sha256:06539ded452e3219959d498fd6974839da20eb3aa42e0f74982c7e5e238ae863                                                                                                                     0.3s
 => => sha256:3eb4fd8a2aeb59f57d427df0096f7cc87d82d677fa1bb1a23a06d411a93883cb 63.70kB / 63.70kB                                                                                                              4.7s
 => => sha256:ce5cf596bde183231eb0f8421b5bdc5d3c94591697ee13efbeac1a079e6d28a3 1.68kB / 1.68kB                                                                                                                4.9s
 => => sha256:ce5c9df3e52353c388d60d1aab727a0960c587153dfc99bbeec88e644112b674 1.52kB / 1.52kB                                                                                                                5.3s
 => => extracting sha256:23ddea9c53775c39f9bfab9a019d4b76ef99631f4374ac9790855650673723d7                                                                                                                     0.6s
 => => extracting sha256:ceb5944230f9f8a8e22bc1323cf8bb798aa1b25d2f8a8a4e2a99ce1c5be8d934                                                                                                                     0.0s
 => => extracting sha256:e9783fd39993a8d51ba3217b92503a56fbceaa2c29e0da28981c8c41313a32f6                                                                                                                     0.0s
 => => extracting sha256:91d0d349e26fd3284603e9b8c6d7721deebd0df3a67c9e18f67801bbf975a82f                                                                                                                    10.0s
 => => extracting sha256:3eb4fd8a2aeb59f57d427df0096f7cc87d82d677fa1bb1a23a06d411a93883cb                                                                                                                     0.0s
 => => extracting sha256:ce5cf596bde183231eb0f8421b5bdc5d3c94591697ee13efbeac1a079e6d28a3                                                                                                                     0.0s
 => => extracting sha256:ce5c9df3e52353c388d60d1aab727a0960c587153dfc99bbeec88e644112b674                                                                                                                     0.0s
 => [stage-0 2/9] RUN apt-get update && apt-get install -y software-properties-common                                                                                                                        40.1s
 => [stage-0 3/9] RUN apt-get install -y g++-11 make python3 python-is-python3 pip                                                                                                                           37.3s
 => [stage-0 4/9] COPY ./requirements.txt .                                                                                                                                                                   0.2s
 => [stage-0 5/9] RUN --mount=type=cache,target=/root/.cache CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --timeout 100 -r requirements.txt llama-cpp-python==0.1.83                            550.2s
 => [stage-0 6/9] COPY SOURCE_DOCUMENTS ./SOURCE_DOCUMENTS                                                                                                                                                    0.4s
 => [stage-0 7/9] COPY ingest.py constants.py ./                                                                                                                                                              0.2s
 => ERROR [stage-0 8/9] RUN --mount=type=cache,target=/root/.cache python ingest.py --device_type cpu                                                                                                       174.3s
------
 > [stage-0 8/9] RUN --mount=type=cache,target=/root/.cache python ingest.py --device_type cpu:
4.392 2023-11-05 22:13:47,909 - INFO - ingest.py:144 - Loading documents from //SOURCE_DOCUMENTS
4.394 Importing: Orca_paper.pdf
4.402 2023-11-05 22:13:47,919 - INFO - ingest.py:44 - Loading document batch
7.325 //SOURCE_DOCUMENTS/Orca_paper.pdf loaded.
7.325 
7.325 //SOURCE_DOCUMENTS/Orca_paper.pdf loading error: 
7.325 libGL.so.1: cannot open shared object file: No such file or directory
7.325 
7.360 2023-11-05 22:13:50,876 - INFO - ingest.py:153 - Loaded 1 documents from //SOURCE_DOCUMENTS
7.360 2023-11-05 22:13:50,876 - INFO - ingest.py:154 - Split into 0 chunks of text
8.307 2023-11-05 22:13:51,824 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
Downloading (…)c7233/.gitattributes: 100%|██████████| 1.48k/1.48k [00:00<00:00, 1.76MB/s]
Downloading (…)_Pooling/config.json: 100%|██████████| 270/270 [00:00<00:00, 480kB/s]
Downloading (…)/2_Dense/config.json: 100%|██████████| 116/116 [00:00<00:00, 296kB/s]
Downloading pytorch_model.bin: 100%|██████████| 3.15M/3.15M [00:00<00:00, 8.06MB/s]
Downloading (…)9fb15c7233/README.md: 100%|██████████| 66.3k/66.3k [00:00<00:00, 907kB/s]
Downloading (…)b15c7233/config.json: 100%|██████████| 1.53k/1.53k [00:00<00:00, 2.78MB/s]
Downloading (…)ce_transformers.json: 100%|██████████| 122/122 [00:00<00:00, 311kB/s]
Downloading pytorch_model.bin: 100%|██████████| 1.34G/1.34G [02:34<00:00, 8.66MB/s]
Downloading (…)nce_bert_config.json: 100%|██████████| 53.0/53.0 [00:00<00:00, 116kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 2.20k/2.20k [00:00<00:00, 3.44MB/s]
Downloading spiece.model: 100%|██████████| 792k/792k [00:00<00:00, 8.97MB/s]
Downloading (…)c7233/tokenizer.json: 100%|██████████| 2.42M/2.42M [00:00<00:00, 6.37MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 2.41k/2.41k [00:00<00:00, 5.40MB/s]
Downloading (…)15c7233/modules.json: 100%|██████████| 461/461 [00:00<00:00, 979kB/s]
172.6 load INSTRUCTOR_Transformer
172.6 max_seq_length  512
172.6 Traceback (most recent call last):
172.6   File "//ingest.py", line 181, in <module>
172.6     main()
172.6   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
172.6     return self.main(*args, **kwargs)
172.6   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
172.6     rv = self.invoke(ctx)
172.6   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
172.6     return ctx.invoke(self.callback, **ctx.params)
172.6   File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
172.6     return __callback(*args, **kwargs)
172.6   File "//ingest.py", line 168, in main
172.6     db = Chroma.from_documents(
172.6   File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 613, in from_documents
172.6     return cls.from_texts(
172.6   File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 577, in from_texts
172.6     chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
172.6   File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 187, in add_texts
172.6     embeddings = self._embedding_function.embed_documents(texts)
172.6   File "/usr/local/lib/python3.10/dist-packages/langchain/embeddings/huggingface.py", line 169, in embed_documents
172.6     embeddings = self.client.encode(instruction_pairs, **self.encode_kwargs)
172.6   File "/usr/local/lib/python3.10/dist-packages/InstructorEmbedding/instructor.py", line 524, in encode
172.6     if isinstance(sentences[0],list):
172.6 IndexError: list index out of range
------
Dockerfile:18
--------------------
  16 |     # If this changes in the future you can `docker build --build-arg device_type=cuda  . -t localgpt` (+GPU argument to be determined).
  17 |     ARG device_type=cpu
  18 | >>> RUN --mount=type=cache,target=/root/.cache python ingest.py --device_type $device_type
  19 |     COPY . .
  20 |     ENV device_type=cuda
--------------------
ERROR: failed to solve: process "/bin/sh -c python ingest.py --device_type $device_type" did not complete successfully: exit code: 1
marcb152 commented 7 months ago

encountered this (or similar?) issue following docker instructions:

Hi, I encountered the exact same issue as yours while building the Dockerfile. To solve it as @a-l-e-it suggested, I added: && apt-get install ffmpeg libsm6 libxext6 -y at the end of the line: RUN apt-get update && apt-get install -y software-properties-common So the modified line looks like this: RUN apt-get update && apt-get install -y software-properties-common && apt-get install ffmpeg libsm6 libxext6 -y

Which works!

henryuylearning commented 7 months ago

exact the same issue, any update?

marzekan commented 6 months ago

Hi, I'm running python ingest.py --device_type cpu on Windows 10 and receive the same error. I managed to get it working by specifing the text encoding when instantiating langchain loader class in load_single_document() function.

I changed loader = loader_class(file_path) to loader = loader_class(file_path, encoding='UTF-8') and it works fine now.

Hope it helps

henryuylearning commented 6 months ago

Thanks a lot! Everything was running smoothly on the CPU, but I encountered an error when switching to CUDA. I managed to resolve it by reinstalling charset_normalizer. I just realized I hadn't updated it. Here's what I did: pip uninstall charset_normalizer pip install charset_normalizer Thanks again for all your help and kindness😊

On Fri, Dec 15, 2023 at 10:24 PM Marko Zekan @.***> wrote:

Hi, I'm running python ingest.py --device_type cpu on Windows 10 and receive the same error. I managed to get it working by specifing the text encoding when instantiating langchain loader class in load_single_document() function.

I changed loader = loader_class(file_path) to loader = loader_class(file_path, encoding='UTF-8') and it works fine now.

Hope it helps

— Reply to this email directly, view it on GitHub https://github.com/PromtEngineer/localGPT/issues/602#issuecomment-1857966955, or unsubscribe https://github.com/notifications/unsubscribe-auth/BDN6TQUE5IUORK75KCY7SQTYJRMSFAVCNFSM6AAAAAA6KDODUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJXHE3DMOJVGU . You are receiving this because you commented.Message ID: @.***>

PayteR commented 5 months ago

I am Windows user. I deleted CUDA 12.3 and installed CUDA again but version 12.1, because on site https://pytorch.org/ is 12.1 in Compute platform row. Then I installed pytorch again by command written there.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

And now CUDA works fine

adelton commented 3 months ago

I see that

 IndexError: list index out of range

error when I happen to have an empty file in SOURCE_DOCUMENTS.

bilal-ismail commented 1 month ago

2023-10-21 08:18:04,561 - INFO - ingest.py:153 - Loaded 1429 documents from C:\Users\billy\localGPT/SOURCE_DOCUMENTS 2023-10-21 08:18:04,561 - INFO - ingest.py:154 - Split into 0 chunks of text 2023-10-21 08:18:06,628 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 Traceback (most recent call last): File "C:\Users\billy\localGPT\ingest.py", line 181, in main() File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 1157, in call return self.main(args, kwargs) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\click\core.py", line 783, in invoke return __callback(args, kwargs) File "C:\Users\billy\localGPT\ingest.py", line 168, in main db = Chroma.from_documents( File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 613, in from_documents return cls.from_texts( File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 577, in from_texts chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 187, in add_texts embeddings = self._embedding_function.embed_documents(texts) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\langchain\embeddings\huggingface.py", line 169, in embed_documents embeddings = self.client.encode(instruction_pairs, self.encode_kwargs) File "C:\Users\billy\anaconda3\envs\localGPT\lib\site-packages\InstructorEmbedding\instructor.py", line 524, in encode if isinstance(sentences[0],list): IndexError: list index out of range

As i am using --device_type cpu, On top of this log I had a line mentioning, loading error: operator torchvision::nms does not exist. I resolved it by uninstalling and installing the following packages. torch torchvision

pip uninstall torch torchvision pip install torch torchvision

This worked for me in resolving the index out of range and the load error "nms not found". Might be helpful for others.