ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.73k stars 8.83k forks source link

error: unknown argument: --logits-all #4184

Closed AntonioZC666 closed 3 months ago

AntonioZC666 commented 7 months ago

Hi! I'm a beginner in performance optimization and I am using llama.cpp as my workload recently. The model I used is llama-2-7b-chat.Q3_K_M.gguf.

I want to know the number of tokens for the text generated by each input. How can I get them?

In parameters menu, I see --logits-all return logits for all tokens in the batch (default: disabled). However, when I used it I met error like this: error: unknown argument: --logits-all.

How can I get the imformation of tokens? Can parameter --logits-all help me? If so, how should I use it? Thank you.

KerfuffleV2 commented 7 months ago

That option got removed at some point but the help text wasn't updated. I don't think it ever had an effect that would have been useful from the commandline though.

Also unfortunately the commandline option stuff is sort of janky - it's basically all the the options that any of the examples has. So you'll see options that only some examples support when you do --help - and the example you're currently running may not support that option at all.

AntonioZC666 commented 7 months ago

That option got removed at some point but the help text wasn't updated. I don't think it ever had an effect that would have been useful from the commandline though.

Also unfortunately the commandline option stuff is sort of janky - it's basically all the the options that any of the examples has. So you'll see options that only some examples support when you do --help - and the example you're currently running may not support that option at all.

Thank you. And can I get the number of tokens for the text generated by each input? I looked up a lot of posts but none mentions this.

KerfuffleV2 commented 7 months ago

And can I get the number of tokens for the text generated by each input?

Do you mean the top however many most likely tokens? I don't think you can get that information from the commandline, but the server example supports returning that information in queries if you ask for it (see the README in the examples/server directory). Or of course there's always using llama.cpp as an API and you can do whatever you want when sampling.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.