CyberShadow commented 4 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Hi! I am experimenting with using llama.cpp as a general-purpose code completion backend, similar to TabNine.

I am encountering a small problem: if the completion prompt ends mid-word, the results are not very accurate. For example, for a prompt such as Five, Four, Thre [sic], the model will often ignore the typo and suggest , Two (forming Thre, Two).

I think, as an option to the /completion server API, the following optional behavior would be useful:

Tokenize the text
Chop off the last token
Run the prediction with the remaining tokens, but only consider those tokens whose bytes start with the bytes of the last token.

Thanks!

stduhpf commented 4 months ago

The usual name for this feature is "token healing". I agree that it would be nice to have it supported here.

ilyannn commented 4 months ago

@ggerganov I'd like to try working on it as my first issue!

ggerganov commented 4 months ago

Ok. This can be demonstrated in one of the examples. One way would be to add it to main or simple + extend llama_sampling_sample with the necessary functionality

mare5x commented 2 months ago

Hi @ilyannn, do you still want to work on this? I've created a draft PR (#7028) that demonstrates token healing, but I still haven't added it to main or server. We can collaborate on that, if you'd like.

ilyannn commented 2 months ago

@mare5x Sorry, I have not actually started so please don't wait for me. I'll try to take a look at your PR this week though and will be happy to help in any way I can.

ggerganov / llama.cpp

server : add "token healing" support #5765

Prerequisites

Feature Description