Closed PhilKes closed 2 months ago
Hey, just curious why ggerganov/llama.cpp#7102 (comment) prompted this?
Hey, just curious why ggerganov/llama.cpp#7102 (comment) prompted this?
We use GGUF models from HF (e.g https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat-GGUF) which (afaik) do not have the special token modification from your PR (https://github.com/ggerganov/llama.cpp/pull/7166), therefore we can't use them with /infill
, and reverted back to using /completion
for FIM prompts.
Thats nice, but we would need a working GGUF for all models that we support, the list is quite long (see HuggingFaceModel) with different FIM prompt templates (see InfillPromptTemplate) and its continously growing with new models coming up. We want the same solution for every existing and new model, so /infill
is no option for us atm
In that case I think you will have to maintain your own copies, which shouldn't be that hard though using gguf-new-metadata.py
. I doubt anyone else is going to bother manually adding this metadata, and it's unlikely the conversion scripts will (it's not quite working properly on the ones it does add them to already (instruct/chat tuned models can lose fill-in-middle capability even though they still have the tokens)).
In that case I think you will have to maintain your own copies, which shouldn't be that hard though using
gguf-new-metadata.py
. I doubt anyone else is going to bother manually adding this metadata, and it's unlikely the conversion scripts will (it's not quite working properly on the ones it does add them to already (instruct/chat tuned models can lose fill-in-middle capability even though they still have the tokens)).
Thanks for your suggestion but its easier for us to just maintain the FIM templates in our project and using the /completion
Endpoint, than maintaining our own GGUF copies.
This reverts commit 8de72b330178fa97b0482e1dbb2829964f8b737a.
As discussed in #510 this reverts the switch from
/completion
to/infill