Closed hronoas closed 1 year ago
@hronoas I've done some testing with your exact setup and was definitely seeing that the GPU usage continues to be high for some time after pressing stop, like you said. I wondered if this could be due to us generating summaries of the message for a) the session title and b) chat context. We have a setting in ContinueConfig
, disable_summaries=True. I tried this and it seemed then that the GPU usage dropped immediately after pressing stop.
Are you able to verify similar behavior?
(also, the "stop" method is a bit unrelated to streaming of responses. This is a method for cleaning up the model altogether (and probably ought to be named this way instead), for example closing any network connections or other resources)
@sestinj Everything is as you said
Tested with the disable_summaries=False option: Stoped first response: does not interrupt the load on the GPU Stoped all the following messages: interrupts the load on the GPU
Tested with the disable_summaries=True option: When you click the stop button, the load on the GPU stops in both cases.
I'm not understand... why generating second title... 😨 maybe bug? or incorrect config?
First generate: Please write a short title summarizing the message quoted above. Use no more than 10 words: by continuedev\plugins\steps\chat.py
Second generate: Give a short title to describe the above chat session. Do not put quotes around the title. Do not use more than 6 words. The title is: by continuedev\core\autopilot.py
Ok, great. This means there is no bug, but since the summaries are no longer being used frequently it might be best to disable them by default.
There are 2 titles because 1 of them is for the main session title, and another is a summary that is generated for every LLM response
@sestinj For my mind it's a bug... For efficient use of resources, when you press the stop button, it is advisable to interrupt the generation from LLM and perhaps not create a chat title from the first response. On low performanse hardware generation of long responses can continue minutes and can not be interrupted.
Please check Example output from text-generation-webui of interrupted first response in my message. All title generations are use not full response from LLM, used only response generated before stop button pressed.
@hronoas I see what you're saying now. I just made the fix so that if you press stop the title + summary will not be generated.
Once this change has been shipped in a new version I'll let you know!
@hronoas This change has been shipped so going to close the issue. Let me know if you're seeing anything else problematic, or feel free to re-open!
Before submitting your bug report
Relevant environment info
Description
I use a local text-generation-webui server with the openai plugin. If you click the stop button in the continue chat after the generation of a response has begun, the output of text to the chat stops, but text-generation-webui continues to generate a response.
When trying to add a custom stop function to a model class in ~/.continue/config.py like this:
logger.debug(f"STOPING!") - never called, neither when a button is pressed, nor when changing a model, nor when closing VSCode
To reproduce