Should support download multiple files model, e.g., qwen2.5

zwpaper commented 1 month ago

Please describe the feature you want

Tabby will now download gguf model by the URL specified in the model registry, but it only supports one URL per model, the vec is used for selecting one URL by TABBY_DOWNLOAD_HOST.

https://github.com/TabbyML/tabby/blob/ca7895b2f80f81c2b723ab7d4bd1f3fc5edd32fc/crates/tabby-common/src/registry.rs#L10-L19

qwen 2.5 models have multiple files for each model:

solution

specify the first part url with the standard filename format, e.g., qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf, and we can parse both the index and total number from the filename, then download all parts of the GGUFs.

this is also how the llama-server supported the splited models.

Tabby now save model model file in name model.gguf, as for splited ones, we should also append the suffix -00001-of-00003.gguf

Additional context Add any other context or screenshots about the feature request here.

Please reply with a 👍 if you want this feature.

katopz commented 1 month ago

Qwen 2.5 Coder 7B outperforms many model there, so yes this is a must have.

vpckso commented 1 month ago

I can use ollama with this data/config.toml

[model.completion.http]
kind = "ollama/completion"
model_name = "qwen2.5-coder:7b-base"
api_endpoint = "YOUR_ENDPOINT"
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"

[model.chat.http]
kind = "openai/chat"
model_name = "qwen2.5-coder"
api_endpoint = "YOUR_ENDPOINT/v1"
api_key = "dummy"

completion but it return <|endoftext|>, don't know it is ollama or tabby problem.

umialpha commented 1 month ago

@zwpaper I can work on this issue. But, I think downloading through inference might lead to problems, as we can't know if all users are making such inferences. I believe it would be better for us to download by using regular expressions or specifying all the files. What do you think?

wsxiaoys commented 2 weeks ago

fixed in 0.19

TabbyML / tabby

Should support download multiple files model, e.g., qwen2.5 #3181

solution