ValueError: Requested tokens (10045) exceed context window of 10016

curvedinf commented 2 months ago

When chatting with an LLM, sometimes dir-assistant sends too much context to the llm.

Counts appear correct on dir-assistant's end. Perhaps this is because in some cases, the embedding models's token set differs from the chat llm's, so it can't produce correct token counts.

We should probably use the chat model's token counter if available, and only fall back to the embedding model if they aren't available for some reason (for instance an API doesn't offer a token counter).

For API models, you can simply set the context limit lower than the API's limit as a user. However, for local models this isn't possible because the context limit also sets the memory allocated.

dalgibbard commented 2 months ago

I'm seeing a possibly related error; when loading the cached embeddings, it states:

Creating index from embeddings...
Loading local LLM model...
LLM context size: 9200

Then when asking a question, it started writing out the answer, and then after a while returns:

(generating contextual guidance...)Traceback (most recent call last):
  File "/home/user/.local/bin/dir-assistant", line 8, in <module>
    sys.exit(main())
  File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/main.py", line 153, in main
    start(args, config_dict["DIR_ASSISTANT"])
  File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/cli/start.py", line 166, in start
    llm.stream_chat(user_input)
  File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/git_assistant.py", line 101, in stream_chat
    super().stream_chat(user_input)
  File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/base_assistant.py", line 132, in stream_chat
    stream_output = self.run_stream_processes(user_input, True)
  File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/cgrag_assistant.py", line 98, in run_stream_processes
    output_history = self.run_completion_generator(cgrag_generator, output_history, False)
  File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/llama_cpp_assistant.py", line 49, in run_completion_generator
    for chunk in completion_output:
  File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama_chat_format.py", line 289, in _convert_text_completion_chunks_to_chat
    for i, chunk in enumerate(chunks):
  File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1216, in _create_completion
    for token in self.generate(
  File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 808, in generate
    self.eval(tokens)
  File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 658, in eval
    self.scores[n_past + n_tokens - 1, :].reshape(-1)[::] = logits
IndexError: index 9200 is out of bounds for axis 0 with size 9200

Is this the same issue?

curvedinf commented 2 months ago

No that issue is most likely caused by having no files to load in the current directory. I've added a descriptive message and controlled exit in the version I will be pushing up next time I update. Add an empty file to start.

On Fri, Aug 30, 2024, 11:36 AM Darren Gibbard @.***> wrote:

I'm seeing a possibly related error; when loading the cached embeddings, it states:

Creating index from embeddings... Loading local LLM model... LLM context size: 9200

Then when asking a question, it started writing out the answer, and then after a while returns:

(generating contextual guidance...)Traceback (most recent call last): File "/home/user/.local/bin/dir-assistant", line 8, in sys.exit(main()) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/main.py", line 153, in main start(args, config_dict["DIR_ASSISTANT"]) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/cli/start.py", line 166, in start llm.stream_chat(user_input) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/git_assistant.py", line 101, in stream_chat super().stream_chat(user_input) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/base_assistant.py", line 132, in stream_chat stream_output = self.run_stream_processes(user_input, True) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/cgrag_assistant.py", line 98, in run_stream_processes output_history = self.run_completion_generator(cgrag_generator, output_history, False) File "/home/user/.local/lib/python3.10/site-packages/dir_assistant/assistant/llama_cpp_assistant.py", line 49, in run_completion_generator for chunk in completion_output: File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama_chat_format.py", line 289, in _convert_text_completion_chunks_to_chat for i, chunk in enumerate(chunks): File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 1216, in _create_completion for token in self.generate( File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 808, in generate self.eval(tokens) File "/home/user/.local/lib/python3.10/site-packages/llama_cpp/llama.py", line 658, in eval self.scores[n_past + n_tokens - 1, :].reshape(-1)[::] = logits IndexError: index 9200 is out of bounds for axis 0 with size 9200

Is this the same issue?

— Reply to this email directly, view it on GitHub https://github.com/curvedinf/dir-assistant/issues/8#issuecomment-2321911268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWI3XS2IVCMDZYEI5YMBGDZUCNP3AVCNFSM6AAAAABM7ARV6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRRHEYTCMRWHA . You are receiving this because you authored the thread.Message ID: @.***>

dalgibbard commented 2 months ago

There's definitely files, and the code it initially started to give back was good, and then it failed with the error part-way through. This is with the default models/config, and cuda enabled. I'll retest when the next update is available anyways, no rush :)

curvedinf commented 2 months ago

I'll take a look through and see what it could be caused by

On Fri, Aug 30, 2024, 12:29 PM Darren Gibbard @.***> wrote:

There's definitely files, and the code it initially started to give back was good, and then it failed with the error part-way through. This is with the default models/config, and cuda enabled. I'll retest when the next update is available anyways, no rush :)

— Reply to this email directly, view it on GitHub https://github.com/curvedinf/dir-assistant/issues/8#issuecomment-2322029993, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWI3XTFWDZ4ZEU23OOXXNTZUCTZJAVCNFSM6AAAAABM7ARV6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSGAZDSOJZGM . You are receiving this because you authored the thread.Message ID: @.***>

curvedinf / dir-assistant

ValueError: Requested tokens (10045) exceed context window of 10016 #8