amaiya / onprem

A tool for running on-premises large language models with non-public data
https://amaiya.github.io/onprem
Apache License 2.0
684 stars 32 forks source link

Support for Mistral 7B Model #40

Closed rabilrbl closed 10 months ago

rabilrbl commented 10 months ago

Mistral is one of the most efficient and faster model with good result.

model_url: https://huggingface.co/TheBloke/SlimOpenOrca-Mistral-7B-GGUF/blob/main/slimopenorca-mistral-7b.Q4_K_M.gguf

Program:

from onprem import LLM
llm = LLM("https://huggingface.co/TheBloke/SlimOpenOrca-Mistral-7B-GGUF/blob/main/slimopenorca-mistral-7b.Q4_K_M.gguf",n_gpu_layers=32, verbose=True, max_tokens=2048)

Error Output:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
[<ipython-input-16-05ceb5047103>](https://localhost:8080/#) in <cell line: 2>()
      1 from onprem import LLM
----> 2 llm = LLM("https://huggingface.co/TheBloke/SlimOpenOrca-Mistral-7B-GGUF/blob/main/slimopenorca-mistral-7b.Q4_K_M.gguf",n_gpu_layers=32, verbose=True, max_tokens=2048)

3 frames
[/usr/local/lib/python3.10/dist-packages/onprem/core.py](https://localhost:8080/#) in __init__(self, model_url, use_larger, n_gpu_layers, model_download_path, vectordb_path, max_tokens, n_ctx, n_batch, mute_stream, callbacks, embedding_model_name, embedding_model_kwargs, embedding_encode_kwargs, rag_num_source_docs, rag_score_threshold, confirm, verbose, **kwargs)
    117         self.verbose = verbose
    118         self.extra_kwargs = kwargs
--> 119         self.load_llm()
    120 
    121     @classmethod

[/usr/local/lib/python3.10/dist-packages/onprem/core.py](https://localhost:8080/#) in load_llm(self)
    224 
    225         if not self.llm:
--> 226             self.llm = llm = LlamaCpp(
    227                 model_path=model_path,
    228                 max_tokens=self.max_tokens,

[/usr/local/lib/python3.10/dist-packages/langchain/load/serializable.py](https://localhost:8080/#) in __init__(self, **kwargs)
     72 
     73     def __init__(self, **kwargs: Any) -> None:
---> 74         super().__init__(**kwargs)
     75         self._lc_kwargs = kwargs
     76 

/usr/local/lib/python3.10/dist-packages/pydantic/main.cpython-310-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for LlamaCpp
__root__
  Could not load Llama model from path: /root/onprem_data/slimopenorca-mistral-7b.Q4_K_M.gguf. Received error  (type=value_error)
amaiya commented 10 months ago

Hello @rabilrbl ,

It looks like you didn't point to the right URL. Your URL points to the download page (copy and paste URL in browser to see), but not the actually GGUF model. The actual model is accessible at the "download" link on that page, which is:

https://huggingface.co/TheBloke/SlimOpenOrca-Mistral-7B-GGUF/resolve/main/slimopenorca-mistral-7b.Q4_K_M.gguf

If you use the above URL, everything works.

I've tried all the following Mistral models and they all work perfectly:

mistral-7b-instruct-v0.1.Q2_K.gguf  mistral-7b-v0.1.Q4_K_M.gguf  slimopenorca-mistral-7b.Q4_K_M.gguf
rabilrbl commented 10 months ago

@amaiya my bad.