PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
20.06k stars 2.24k forks source link

ValueError: Requested tokens (4257) exceed context window of 4096 #364

Open linfengca opened 1 year ago

linfengca commented 1 year ago

if I change 4096 to 8192, "max_ctx_size = 8192", it raises the same error message...

PromtEngineer commented 1 year ago

which model are you using and which format (ggml or gptq)?

HamiguaLu commented 1 year ago

Encountered the similar issue, I am using Mistra 7b on my PC()RTXA4000): MODEL_ID = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF" MODEL_BASENAME = "mistral-7b-instruct-v0.1.Q8_0.gguf"

stonez56 commented 1 year ago

I have an Intel based MacBook pro with 16G RAM Here is the model I used: zephyr-7B-beta-GGUF model

MODEL_ID = "TheBloke/zephyr-7B-beta-GGUF"
MODEL_BASENAME = "zephyr-7b-beta.Q4_K_M.gguf"

When running the run_localGPT.py file, I've encountered this error:

pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain
llm none is not an allowed value (type=type_error.none.not_allowed)

and I fixed according to others recommendation with

pip install llama-cpp-python==0.1.83

Then I ran run_localGPT.py file and then this occurred:

ValueError: Requested tokens (5543) exceed context window of 4096

@PromtEngineer
I've tried to modify the CONTEXT_WINDOW_SIZE from 4096 to 2048 to 1024 and none of them worked. Not sure where is the issue? Can you help?

PromtEngineer commented 1 year ago

@stonez56 you need to change the chunk_size here Set it to a smaller value such as 700 if that works, keep increasing it to something like 800. Play around with the value.

stonez56 commented 1 year ago

@PromtEngineer I changed CONTEXT_WINDOW_SIZE in constants.py file from 4096 to 8192 and it worked! :)