I attempted to load the Mistral Open Orca model, but I couldn't, so I changed the code a little, but now I receive an answer if I ask the same query again, so the first time it gives no answer, and the second time it responds, maybe from the cache. This is all new to me.
Please let me know if there is a method to import the "mistral-7b-openorca.Q4_K_M.gguf" model without editing.
My edit:
def build_llm(model):
# Call the function to get the number of threads to use
num_threads = determine_threads_to_use()
print(f"Number of Threads avaialble : {num_threads}")
llm = LlamaCpp(
model_path=cfg.MODEL_BIN_DIR + "/" + model,
n_gpu_layers=num_threads,
# f16_kv=True,
n_batch=8192 / 4,
n_ctx=8192,
max_tokens=cfg.MAX_NEW_TOKENS,
n_threads=num_threads,
temperature=cfg.TEMPERATURE,
streaming=True,
repeat_penalty= 1.3,
callbacks=[StreamingStdOutCallbackHandler()],
)
# Local CTransformers model
# llm = CTransformers(
# model=cfg.MODEL_BIN_DIR + "/" + model,
# model_type="llama",
# config={
# "max_new_tokens": cfg.MAX_NEW_TOKENS,
# "temperature": cfg.TEMPERATURE,
# "threads": num_threads,
# "stream": True,
# "repetition_penalty": 1.3,
# },
# callbacks=[StreamingStdOutCallbackHandler()],
# )
return llm
If I use the default code, the error is:
RuntimeError: Failed to create LLM 'llama' from 'models/mistral-7b-openorca.Q4_K_M.gguf'.
I attempted to load the Mistral Open Orca model, but I couldn't, so I changed the code a little, but now I receive an answer if I ask the same query again, so the first time it gives no answer, and the second time it responds, maybe from the cache. This is all new to me.
Please let me know if there is a method to import the "mistral-7b-openorca.Q4_K_M.gguf" model without editing.
My edit:
If I use the default code, the error is:
RuntimeError: Failed to create LLM 'llama' from 'models/mistral-7b-openorca.Q4_K_M.gguf'.
No matter what model type I change.