Revert "fix: use /infill for llama.cpp code-completions (#513)"

carlrobertoh / CodeGPT

JetBrains extension providing access to state-of-the-art LLMs, such as GPT-4, Claude 3, Code Llama, and others, all for free

https://codegpt.ee

Apache License 2.0

912 stars 186 forks source link

Revert "fix: use /infill for llama.cpp code-completions (#513)" #533

Closed PhilKes closed 2 months ago

PhilKes commented 2 months ago

This reverts commit 8de72b330178fa97b0482e1dbb2829964f8b737a.

As discussed in #510 this reverts the switch from /completion to /infill

CISC commented 2 months ago

Hey, just curious why ggerganov/llama.cpp#7102 (comment) prompted this?

PhilKes commented 1 month ago

Hey, just curious why ggerganov/llama.cpp#7102 (comment) prompted this?

We use GGUF models from HF (e.g https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat-GGUF) which (afaik) do not have the special token modification from your PR (https://github.com/ggerganov/llama.cpp/pull/7166), therefore we can't use them with /infill, and reverted back to using /completion for FIM prompts.

CISC commented 1 month ago

You can use mine which does. :)

PhilKes commented 1 month ago

Thats nice, but we would need a working GGUF for all models that we support, the list is quite long (see HuggingFaceModel) with different FIM prompt templates (see InfillPromptTemplate) and its continously growing with new models coming up. We want the same solution for every existing and new model, so /infill is no option for us atm

CISC commented 1 month ago

In that case I think you will have to maintain your own copies, which shouldn't be that hard though using gguf-new-metadata.py. I doubt anyone else is going to bother manually adding this metadata, and it's unlikely the conversion scripts will (it's not quite working properly on the ones it does add them to already (instruct/chat tuned models can lose fill-in-middle capability even though they still have the tokens)).

PhilKes commented 1 month ago

In that case I think you will have to maintain your own copies, which shouldn't be that hard though using gguf-new-metadata.py. I doubt anyone else is going to bother manually adding this metadata, and it's unlikely the conversion scripts will (it's not quite working properly on the ones it does add them to already (instruct/chat tuned models can lose fill-in-middle capability even though they still have the tokens)).

Thanks for your suggestion but its easier for us to just maintain the FIM templates in our project and using the /completion Endpoint, than maintaining our own GGUF copies.