Number of tokens (757) exceeded maximum context length (512).

datacrud8 commented 1 year ago

hi, trying to build this app in local, and used same model llama-2-7b-chat.ggmlv3.q8_0.bin when run the app UI showing some random message same like you showed but checking in console getting this below message:

Number of tokens (755) exceeded maximum context length (512). Number of tokens (756) exceeded maximum context length (512). Number of tokens (757) exceeded maximum context length (512).

so increased max_new_tokens=2048, and increased n_ctx and added truncate=True , non of them are fixing this issue. Changed the model as well. still same issue.

do you know any solution for this issue?

balavenkatesh-ai commented 11 months ago

Same issue @AIAnytime

AIAnytime commented 10 months ago

Hi, CTransformers has max_context_size of 512 tokens only. For K=1 or 2 in retriever, it will work fine. If K >3, Your context window might exceed. Better to use llama-cpp-python if you have n. number of retrieved chunks that is fed to the LLM.

mynampati commented 10 months ago

Thanks for your suggestion, dear. I was facing the same problem

gregdolder commented 6 months ago

Hi, CTransformers has max_context_size of 512 tokens only. For K=1 or 2 in retriever, it will work fine. If K >3, Your context window might exceed. Better to use llama-cpp-python if you have n. number of retrieved chunks that is fed to the LLM.

can you explain this a bit more. I tried changing the local model to use llamacpp and I'm getting validation errors:

    current_dir = os.path.dirname(os.path.abspath(__file__))   
    model_path = os.path.join(current_dir, 'llama-2-7b-chat.ggmlv3.q8_0.bin')  
    llm = llamacpp.LlamaCpp(
        model_path=model_path, 
        temperature=0.5, 
        max_tokens=5500, 
        top_p=1,
        callback_manager=callback_manager,
        verbose=True
        )
    return llm

I'm getting Could not load Llama model from path: <path here> (type=value_errror). and I know the path is good as it loaded via CTransformers fine but just had the token exceeded issue. Any advice?

Nevermind - just read that LlamaCpp no longer supports GGML models, I'll have to use a different one.

AIAnytime / ChatCSV-Llama2-Chatbot

Number of tokens (757) exceeded maximum context length (512). #2