Open curvedinf opened 2 months ago
I'm seeing a possibly related error; when loading the cached embeddings, it states:
Creating index from embeddings...
Loading local LLM model...
LLM context size: 9200
Then when asking a question, it started writing out the answer, and then after a while returns:
(generating contextual guidance...)Traceback (most recent call last):
File "/home/user/.local/bin/dir-assistant", line 8, in <module>
sys.exit(main())
File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/main.py", line 153, in main
start(args, config_dict["DIR_ASSISTANT"])
File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/cli/start.py", line 166, in start
llm.stream_chat(user_input)
File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/git_assistant.py", line 101, in stream_chat
super().stream_chat(user_input)
File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/base_assistant.py", line 132, in stream_chat
stream_output = self.run_stream_processes(user_input, True)
File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/cgrag_assistant.py", line 98, in run_stream_processes
output_history = self.run_completion_generator(cgrag_generator, output_history, False)
File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/llama_cpp_assistant.py", line 49, in run_completion_generator
for chunk in completion_output:
File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama_chat_format.py", line 289, in _convert_text_completion_chunks_to_chat
for i, chunk in enumerate(chunks):
File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1216, in _create_completion
for token in self.generate(
File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 808, in generate
self.eval(tokens)
File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 658, in eval
self.scores[n_past + n_tokens - 1, :].reshape(-1)[::] = logits
IndexError: index 9200 is out of bounds for axis 0 with size 9200
Is this the same issue?
No that issue is most likely caused by having no files to load in the current directory. I've added a descriptive message and controlled exit in the version I will be pushing up next time I update. Add an empty file to start.
On Fri, Aug 30, 2024, 11:36 AM Darren Gibbard @.***> wrote:
I'm seeing a possibly related error; when loading the cached embeddings, it states:
Creating index from embeddings... Loading local LLM model... LLM context size: 9200
Then when asking a question, it started writing out the answer, and then after a while returns:
(generating contextual guidance...)Traceback (most recent call last): File "/home/user/.local/bin/dir-assistant", line 8, in
sys.exit(main()) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/main.py", line 153, in main start(args, config_dict["DIR_ASSISTANT"]) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/cli/start.py", line 166, in start llm.stream_chat(user_input) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/git_assistant.py", line 101, in stream_chat super().stream_chat(user_input) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/base_assistant.py", line 132, in stream_chat stream_output = self.run_stream_processes(user_input, True) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/cgrag_assistant.py", line 98, in run_stream_processes output_history = self.run_completion_generator(cgrag_generator, output_history, False) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/llama_cpp_assistant.py", line 49, in run_completion_generator for chunk in completion_output: File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama_chat_format.py", line 289, in _convert_text_completion_chunks_to_chat for i, chunk in enumerate(chunks): File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1216, in _create_completion for token in self.generate( File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 808, in generate self.eval(tokens) File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 658, in eval self.scores[n_past + n_tokens - 1, :].reshape(-1)[::] = logits IndexError: index 9200 is out of bounds for axis 0 with size 9200 Is this the same issue?
— Reply to this email directly, view it on GitHub https://github.com/curvedinf/dir-assistant/issues/8#issuecomment-2321911268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWI3XS2IVCMDZYEI5YMBGDZUCNP3AVCNFSM6AAAAABM7ARV6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRHEYTCMRWHA . You are receiving this because you authored the thread.Message ID: @.***>
There's definitely files, and the code it initially started to give back was good, and then it failed with the error part-way through. This is with the default models/config, and cuda enabled. I'll retest when the next update is available anyways, no rush :)
I'll take a look through and see what it could be caused by
On Fri, Aug 30, 2024, 12:29 PM Darren Gibbard @.***> wrote:
There's definitely files, and the code it initially started to give back was good, and then it failed with the error part-way through. This is with the default models/config, and cuda enabled. I'll retest when the next update is available anyways, no rush :)
— Reply to this email directly, view it on GitHub https://github.com/curvedinf/dir-assistant/issues/8#issuecomment-2322029993, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWI3XTFWDZ4ZEU23OOXXNTZUCTZJAVCNFSM6AAAAABM7ARV6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSGAZDSOJZGM . You are receiving this because you authored the thread.Message ID: @.***>
When chatting with an LLM, sometimes dir-assistant sends too much context to the llm.
Counts appear correct on dir-assistant's end. Perhaps this is because in some cases, the embedding models's token set differs from the chat llm's, so it can't produce correct token counts.
We should probably use the chat model's token counter if available, and only fall back to the embedding model if they aren't available for some reason (for instance an API doesn't offer a token counter).
For API models, you can simply set the context limit lower than the API's limit as a user. However, for local models this isn't possible because the context limit also sets the memory allocated.