Open aleksusklim opened 3 months ago
Actually I do want to add proper logprobs to the API, not just for one token, I just haven't found the time to get around to it.
If you would do the API side, I can try to hack around a userscript that adds my proposed UI to Lite, so we can at least see how it might look like, without incorporating everything properly into Lite source for now.
Will the token probabilities be correctly exposed when
The way I'm imagining it to be on API side would be essentially the same as how logprobs are sent over the OpenAI API (per token).
For Lite, it will probably be a separate panel that opens a table, with two columns, one token per row for the first column, and the top 5 logprobs with their text in the second column.
Btw for point number 4, ensure that top-k is 0 or > 5, and top-p is set to 1.0 with other truncation samplers disabled.
I was thinking more about factual probabilities than raw logprobs. Meaning, top_k=1 will essentially output greedy tokens, each 100% the only one there (as Debug mode currently). I see some valuable information here, like judging the direct influence of current sampler parameters (especially, the temperature itself).
On the other hand, raw logits are also might be interesting… I'm not sure whether one is better that the other, but anyway – if you need to see all, you may raise sampler limits, yes. The only downside is that you cannot trigger a greedy sampling if you allow anything than 100% in outputs, but if we call that a "manual sampling" – I think this is not a problem, since you still would be able to tap Home+Enter+Home+Enter… to choose the first of each (as I imagine).
But for OpenAI api to be compatible – you have to return logprobs…
one token per row for the first column, and the top 5 logprobs with their text in the second column
Without a stacked tables with ~unlimited (configurable) number of candidate tokes for each – this would be the same as just looking at Debug mode output. No way to see this in "dynamic", how chosen previous tokens are changing the following probabilities interactively.
That's why I said we need to try this to see, is it really as useful as I imagine it to be! For the start, koboldcpp should return probabilities to Lite, given a flag and limits as I explained. Hmm, even the streaming and aborting behavior is not important at first!
I can name several use cases when a user might want to see not only the randomly generated text, but an exact list of possibilities for the next token. For example:
I understand that even a direct list of token prediction is not enough to judge everything I mentioned above, simply because the model might start its response with unrelated filler words instead of your expected token (for example, you write ", so the answer is" and at low-probable tokens it might want to say "as simple as", leaving you with another logical point where you would need to account for the probabilities again), but you still need the ability to see the list anyhow!
Why not to use "Debug mode" that shows the probabilities, you might ask? Here is why:
I propose a mode called "Manual sampling". To activate it in Lite, the user clicks on "Chat Select" (to the left of input box) and in the popup choses "Manual sampling (reveal token probabilities)" – the new proposed link, below existing "Impersonate the AI Assistant" and "Make the AI write a response as me (for 1 turn)".
After that, the request is performed as currently (sending everything and clearing the input box), but the interface changes:
Additional entries in Settings:
Instead of generating just one token, we can procced to normal stopping conditions, and generate as much as always. But instead of exiting the yellow generation mode (even if Abort was pressed), we can stay in manual sampling: the server should add the array of all of token probabilities to its response, for each generated token. Along with their token indices and the chosen one (for the client to know exactly without tokenizing anything again). This way, Lite can render a pre-populated table stack, so pressing "Rewind" will discard the last token and show the probabilities at the previous one (starting the generation from here again if the user chooses a different path).
Hmm, at this point it becomes more convenient to click on the exact part of the yellow text that I want to change… Maybe make them as separate HTML elements with hover hint on them (highlighting this token boundary and showing its main probability), and with onclick handler that will "rewind here" in the stack? (Of course, when the text becomes white – all of this is discarded and the stack is freed) Or at least maybe "Delete" shortcut key will "Rewind to the beginning", and then you could press "Enter" several times, instantly "typing back" already cached tokens?
I pay so much attention to the client interface because it is more important than just knowing or printing the probabilities, since you need a quick and effective way to explore them! But the convenient API is important too, because I can imagine a custom script that does full "tree search": generating an answer, then picking the most probable "path" (multiplying intermediate probabilities) that was not explored yet, and generating again until all paths summing to the desired probability would be explored – and then adding up all of final probabilities of each branch with the right answer (externally detected anyhow) – to estimate the "general probability of a correct answer" for a model, regardless of the wording of its different answers (and since this accounts for ALL or probabilities, not just temp=0/top_k=1 – such estimation will be more accurate than greedy sampling or simple retrying).
Don't get me wrong, I know there are other ways available to see the logits, starting from various test scripts or frontends (as "server" in llama.cpp) and ending up with raw calls to the model from the code. But I just LOVE KOBOLDCPP SO MUCH that I found everything else so inconvenient and complicated to use (or to transfer my settings/models to)! And it is not like I'm "okay, so I want to compare logits in this specifically crafted prompt, lemme find a right tool", it is more like "oh, it would be really great to see the probabilities here right in this moment of my large story…"
Since this change is so complicated and affecting many parts of the existing codebase (and interfering with most important UI elements), you should consider to implement it only if you truly want: if you like the idea and see it as actually useful both for yourself and for your casual users! (This should not be done in a hacky manner, otherwise it might complicate supporting the code later).
Related (?):