Closed zwpaper closed 2 weeks ago
Qwen 2.5 Coder 7B outperforms many model there, so yes this is a must have.
I can use ollama with this data/config.toml
[model.completion.http]
kind = "ollama/completion"
model_name = "qwen2.5-coder:7b-base"
api_endpoint = "YOUR_ENDPOINT"
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>"
[model.chat.http]
kind = "openai/chat"
model_name = "qwen2.5-coder"
api_endpoint = "YOUR_ENDPOINT/v1"
api_key = "dummy"
but it return <|endoftext|>, don't know it is ollama or tabby problem.
@zwpaper I can work on this issue. But, I think downloading through inference might lead to problems, as we can't know if all users are making such inferences. I believe it would be better for us to download by using regular expressions or specifying all the files. What do you think?
fixed in 0.19
Please describe the feature you want
Tabby will now download gguf model by the URL specified in the model registry, but it only supports one URL per model, the vec is used for selecting one URL by
TABBY_DOWNLOAD_HOST
.https://github.com/TabbyML/tabby/blob/ca7895b2f80f81c2b723ab7d4bd1f3fc5edd32fc/crates/tabby-common/src/registry.rs#L10-L19
qwen 2.5 models have multiple files for each model:
solution
specify the first part url with the standard filename format, e.g.,
qwen2.5-coder-7b-instruct-q8_0-00001-of-00003.gguf
, and we can parse both the index and total number from the filename, then download all parts of the GGUFs.this is also how the llama-server supported the splited models.
Tabby now save model model file in name
model.gguf
, as for splited ones, we should also append the suffix-00001-of-00003.gguf
Additional context Add any other context or screenshots about the feature request here.
Please reply with a 👍 if you want this feature.