AIAnytime / Llama2-Medical-Chatbot

This is a medical bot built using Llama2 and Sentence Transformers. The bot is powered by Langchain and Chainlit. The bot runs on a decent CPU machine with a minimum of 16GB of RAM.
MIT License
296 stars 218 forks source link

Number of tokens (815) exceeded maximum context length (512). #14

Closed Gautam-Kantesariya closed 8 months ago

Gautam-Kantesariya commented 11 months ago

I'm getting this error continuously. Which Parameters affect this?

image

JWBWork commented 11 months ago

Did you ever find a solution to this? I'm facing the same issue. I think this might be related to ctransformers/transformers in general than this specific model.

I was able to extend the context length somewhat which allows the conversation to continue a little longer before I exceed the context length. Once I exceed the context length the model just starts spewing a bunch of gibberish.

model = AutoModelForCausalLM.from_pretrained(
    "TheBloke/WizardLM-7B-uncensored-GGML", 
    model_type="llama",
    context_length=2048,
)

I was at least able to silence these messages like so

import logging
logger = logging.getLogger("ctransformers")
logger.setLevel(logging.ERROR)
Gautam-Kantesariya commented 11 months ago

instead of CTransformers, I use llama cpp. for load the model.

model_path = hf_hub_download(
  repo_id="TheBloke/Llama-2-7b-Chat-GGUF",
  filename="llama-2-7b-chat.Q4_K_M.gguf",
  resume_download=True,
  cache_dir="/models",  #custom path for save the model
)

llm = LlamaCpp(
  model_path = model_path,
  n_gpu_layers=100, #According to your GPU if you have
  n_batch=2048,
  verbose=True,
  f16_kv=True,
  n_ctx=4096
)
Kunjesh07 commented 9 months ago

instead of CTransformers, I use llama cpp. for load the model.

model_path = hf_hub_download(
  repo_id="TheBloke/Llama-2-7b-Chat-GGUF",
  filename="llama-2-7b-chat.Q4_K_M.gguf",
  resume_download=True,
  cache_dir="/models",  #custom path for save the model
)

llm = LlamaCpp(
  model_path = model_path,
  n_gpu_layers=100, #According to your GPU if you have
  n_batch=2048,
  verbose=True,
  f16_kv=True,
  n_ctx=4096
)

does it resolve using this ?

manjunathshiva commented 9 months ago

Yes because he increased the context length from default 512 to 4096. If you do not have GPU do not set n_gpu_layers