Closed Stebalien closed 6 months ago
This is a good idea. For ollama, do you expect that the ollama processing will be canceled as well? I think we can interrupt the connection, but what happens on the Ollama side depends on them, I think.
I think so? It's written in go so I assume the process is tied to the request context.
If not... I'll report a bug upstream.
Sorry for the delay, this should do the trick. Please use the new llm-cancel-request
method. If this works well, I'll make a release soon, although there's a few more things I'd like to do before this next release.
Hm. So, actually, binding url-http-async-sentinel
prevents llm-chat-streaming
from calling either the response/error callback. It's not, strictly speaking, incorrect. Just a bit annoying.
I would expect that canceling the query shouldn't result in any more callbacks, though. Is your intuition about this different?
Specifically, provide a nice way to cancel queries programatically (e.g., from ellama).
This is especially important with, e.g., a local LLM like ollama. I believe it's sufficient to return the
url-request
process buffer, letting the user run something like(let (kill-buffer-query-functions) (kill-buffer my-llm-buffer))
.