Open morbidCode opened 2 months ago
Generally --quiet's goal is to minimize terminal output, so If this is added in future it would be within the API only. You can currently query the /api/extra/perf/
to determine if there is a request in progress, although token information is not available until generation is complete (unless you use polled streaming)
Got it. But will the multiuser flag impact the API call? Suppose I am inference with kobold UI, and then I call /api/extra/perf/. Will koboldcpp clasify it as 2 users? Since the default value of multiuser is 1
It will be fine
Hello. Usually, if --quiet is not set, we usually get this during inference:
but this also outputs the prompts and the response. On the other hand, if --quiet is set, it silences everything except the stats in the end of the response.
Would it be possible to output the "generating" portion even if quiet is set? I think this should not consume too much lines in the terminal since it is updating in place, not creating a new line. The use case for this is for very slow models, it would be nice to see if it is about to finish (like generating: 500/512 tokens), and for non-streaming setups, to see if it is inferencing at all. Thanks!