av / harbor

Effortlessly run LLM backends, APIs, frontends, and services with one command.
https://github.com/av/harbor
Apache License 2.0
544 stars 37 forks source link

Changing cache location isn't working in tabbyapi, vllm #63

Closed zheroz00 closed 2 weeks ago

zheroz00 commented 1 month ago

Hey. I installed this on Linux and changed the model for tabby to 'bullerwins_gemma-2-2b-it-exl2_8.0bpw', and then used the hf dl'er to d/l the model to here. I've confirmed it was dl'ed to the folder below.

(~/.cache/huggingface/hub/models--bullerwins--gemma-2-2b-it-exl2_8.0bpw)

When I run harbor up tabbyapi I get the error below.

File "/app/backends/exllamav2/model.py", line 216, in init self.config.prepare() File "/usr/local/lib/python3.11/dist-packages/exllamav2/config.py", line 166, in prepare assert os.path.exists(self.model_dir), "Can't find " + self.model_dir **AssertionError: Can't find /models/hf/bullerwins_gemma-2-2b-it-exl2_8.0bpw**

What step am I missing? Where is the /models/hf folder that its looking for?

Great project. Thx.

av commented 3 weeks ago

Thanks for another report!

Current TabbyAPI integration is indeed a bit strict about the model placement, but it's designed to also be straightforward. Here's a sample set of commands to run the model above:

# 1. Use HuggingFaceDownloader CLI to get the files from the HF Hub
$ harbor hf dl -m bullerwins/gemma-2-2b-it-exl2_8.0bpw -s ./hf

# [Optional] Use "harbor find" to observe the files on local file system
# uses substring match on path for the filter
$ harbor find gemma-2-2b-it-exl2_8.0bpw

# 2. Set model for tabbyapi, using "harbor tabbyapi model" 
# Using the same model specifier as for the download - i.e. "user/repo" from HF
harbor tabbyapi model bullerwins/gemma-2-2b-it-exl2_8.0bpw

# 3. Start the TabbyAPI service
harbor up tabbyapi

# [Optional] sample chat completion
# curl is just for portability, there are nicer HTTP clients
curl $(harbor url tabbyapi)/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $(harbor config get tabbyapi.api_key)" -d '{
  "model": "hf",
  "messages": [
    {
        "role": "user",
        "content": "How to eat cake?"
    }
  ],
  "max_tokens": 60
}'
# Sample output:
{"id":"chatcmpl-da3f2060ffc04cfa89a9ebd20e2672d3","choices":[{"index":0,"finish_reason":"length","message":{"role":"assistant","content":"Ah, the age-old question! Here's a guide on how to enjoy a piece of cake:\n\n**The Classic Way:**\n\n1. **Present Position:**  Find a comfortable spot, preferably seated or with a cushioned chair. Aim for an upright position to keep your body and posture"},"logprobs":null}],"created":1730025638,"model":"bullerwins_gemma-2-2b-it-exl2_8.0bpw","object":"chat.completion","usage":{"prompt_tokens":14,"completion_tokens":60,"total_tokens":74}}
av commented 3 weeks ago

I've added more detailed explanations in the TabbyAPI service guide

av commented 2 weeks ago

Closing for now, please feel free to follow up or open a new issue if needed