ahyatt / llm

A package abstracting llm capabilities for emacs.
GNU General Public License v3.0
178 stars 24 forks source link

Run handler code via a timer in the main loop #41

Closed r0man closed 6 months ago

r0man commented 6 months ago

Hi @ahyatt,

here is another update. I have opened another PR on plz.el. This PR allows installing a process filter. alphapapa is fine with merging this when the FSF replies to him. My paperwork is basically done.

https://github.com/alphapapa/plz.el/pull/50

On this PR we discussed another issue that I saw sometimes. It was signaling the following error:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  <=(200 nil 299)
  (let* ((status val)) (<= 200 status 299))
  (if (let* ((status val)) (<= 200 status 299)) (let ((status val)) (ignore status) (ignore status) (funcall (process-get process :plz-then))) (let nil (let ((err (make-plz-error :response (plz--response)))) (let* ((val (process-get process :plz-else))) (cond ((null val) (let nil (process-put process :plz-result err))) ((functionp val) (let (...) (funcall fn err))) (t (let (...) (error "No clause matching `%S'" x80))))))))

I think I now also know why this is happening. It happened in rare cases when the process filter was still using a function from plz.el that narrowed the buffer for a moment. If the process filter isn't fast enough it can happend that the process exits before the filter is done. If in such a case the buffer is narrowed the error can happen.

I now removed the code that narrows the buffer. alphapapa suggested that it is a bad idea to run callbacks in the process filter, since it should try to be as fast as possible. I looked a bit around and found the jsonrpc.el package. It also uses a process filter and also calls user code. But the way it does this is by scheduling the work on the main loop with a timer.

https://github.com/emacs-mirror/emacs/blob/master/lisp/jsonrpc.el#L785

This might be a better solution and I changed the code now to do the same.

So what now changed is the following:

I tested the Ollama and the Vertex provider with the llm-tester and they are working fine. I tested the OpenAI provider with Ellama and it also seems to work.

There is however an issue with the OpenAI provider and the llm-tester, which seem to be related to the buffer local llm-openai-current-response variable.

It always fails, but at different steps. If I run the failing step alone it works. There seems to be a race condition with the llm-openai-current-response variable, now that we are running code with the timer.

I have the suspicion that some test steps compete in using this variable.

Can you help me with this? Do you think we can somehow get rid of the llm-openai-current-response variable?

Thanks, Roman.

ahyatt commented 6 months ago

Thanks for the update! And I'm looking forward to getting the plz branch out, so thanks again for working on this.

I'm a bit confused on the timer solution, though. Is this equivalent to running the handler code in a different thread? And if so, why not just use threading?

About the use of buffer-local variables, I think I can fix this, and in fact I'm working on something that will fix it now. It's part of my work in updating the package to use generic functions to let packages define the parts that change while having the implementations of the main llm functions be written (mostly) just once. That part is done on the main branch, and now I'm in the middle of making it work for the plz branch. Just due to how things work, it will make sense to replace the buffer-local variables with closure variables.

The one problematic case is for Open AI's function streaming, where we need to keep a structured updated with streaming information. I don't know how I'm going to do that yet. I'm still working on this and it will probably take at least this weekend to straighten out.

r0man commented 6 months ago

Hi @ahyatt,

you can think of the timer solution that instead of running the user callback when the process filter is called (and maybe delaying its completion) the user callback is scheduled to run "later". "Later" is when Emacs is accepting output from a sub process again, which is after the process filter completed. I'm not sure if this is "equivalent" to running the handler in a different Emacs thread. Emacs also only switches to a different thread when accepting output from a process, but I'm not sure if it is the same. Since this is a tricky area, I stayed as close as I could to jsonrpc.el because our use cases are similar. I can experiment with threads if you want, but I'm not sure it will solve the issue about the shared buffer local variable.

I think the issue I ran into is with the tester is that multiple calls are in flight, and share this variable. One test assumes it is a string it can concatenate on, the other thinks it is an object for function calling. At least that was my impression. If we could close over that variable (similar to what Vertex does with streamed-text) I believe it might solve the issue.

I'm not sure if I follow the problematic case about OpenAI. Isn't the state there also local to the llm-chat-streaming function, and we could close over the state there as well?

I'm also looking forward to getting this out. I'm using this at work mostly with Ellama and I haven't noticed any major issues, except for these tricky things around the process filter. Nevertheless it's actually quite stable, and I hope we can also sort out the last issues.

ahyatt commented 6 months ago

Thanks for the explanation!

BTW, I'm looking at the text/event-stream media handler, and it seems like, unlike the other media handlers, this returns a plz-event-source object, whereas others call handlers with just the text or JSON. In this case, it'd just be the text, since event-streams could contain other things than JSON. This poses issues to my generic solution, in which I'd like to have the generic functions defined in the llm providers get just text or JSON. Can you change this?

r0man commented 6 months ago

Hi @ahyatt,

I'm a bit confused because you say "return". I think you mean that the handler is called with the event source object, and the event object, right?

Ideally, I would like the event source and the text/event-stream media type to follow the spec here: https://html.spec.whatwg.org/multipage/webappapis.html#eventhandler

Now that I look at it it, I probably got this wrong, because the handler is called with 2 arguments the event source, and the event object. I initially thought the handler might want access not only the event, but the source as well. So, I can change that. But you are probably more interested in the media type handlers.

Let me explain, how they work right now:

The text/event-stream :handlers is a map from event type to a function. The event type is a symbol as defined in the spec and the function takes 2 arguments, the event source and the event object.

There are some pre-defined event types in the spec. And depending on the event type, the data slot of the event object carries different information.

I think the message type event follows exactly the spec, and this is the we are probably mostly interested in the OpenAI provider. The data slot contains a string. In the case of OpenAI, this is either the JSON object as a string, or the string [DONE].

For the other events I set the data slot to something I thought is useful to the consumer of that event. I thought the open and close events might be interested in the plz-response object and the error event in the plz-error object. I'm not sure if we even need to register open, close and error handlers, since we can also get this through the plz/plz-media-type-request API via :then and :else.

So, that's what we have now. You seem to want the handler to be called with with only the data slot of the event and not the event object itself, is that right? This would be a bit against the event source spec, but I can come up with a modified media type that suits your needs.

I don't fully understand why this is causing trouble though. Can you point me to the code you have so far, so I can better understand it? I will look a this then tomorrow, since it is getting late here.

Also, if I should change the media type, can you please explain how exactly the handlers should be called for the open, close, error and message event types and with which information for the different cases? I will see if I can make that happen then.

r0man commented 6 months ago

Looking at the main branch, it seems you would like the APIs of the text/event-stream, json-array, and ndjson to be more unified. Is that right? Since the event source is kind of well specified, maybe the others should follow a similar API as text/event-stream. So, instead of a single :handler function they also take a :handlers alist, and maybe also have "event types" like open, close, error etc. Would that help?

r0man commented 6 months ago

I would separate building the data structure that describes a request from the action that does the request, and let each provider be able to tweak parts of the request description. Most importantly setting the media types. But I would not expose setting the media type as a generic function.

So maybe something along those lines:

(cl-defmethod llm-chat-streaming-request ((provider llm-standard-chat-provider) prompt partial-callback response-callback error-callback)
  (list :url (llm-provider-chat-streaming-url provider)
        :data (llm-provider-chat-request-data provider prompt t)
        :then (llm-provider-on-success provider response-callback)
        :else (llm-provider-on-error provider error-callback)))

(cl-defmethod llm-chat-streaming-request ((provider llm-openai) prompt partial-callback response-callback error-callback)
  (append (cl-call-next-method provider prompt partial-callback response-callback error-callback)
          (list :media-types
                (cons `(application/json
                        . (plz-media-type:application/json-array
                           :handlers `((message . (lambda (event)
                                                    (llm-provider-on-partial provider partial-callback (json-read (oref event data))))))))
                      plz-media-types))))

(cl-defmethod llm-chat-streaming-request ((provider llm-vertex) prompt partial-callback response-callback error-callback)
  (append (cl-call-next-method provider prompt partial-callback response-callback error-callback)
          (list :media-types
                (cons `(application/json
                        . (plz-media-type:application/json-array
                           :handler (lambda (object)
                                      (llm-provider-on-partial provider partial-callback object))))
                      plz-media-types))))

(cl-defmethod llm-chat-streaming-request ((provider llm-ollama) prompt partial-callback response-callback error-callback)
  (append (cl-call-next-method provider prompt partial-callback response-callback error-callback)
          :media-types
          (cons `(application/json
                  . (plz-media-type:application/x-ndjson
                     :handler (lambda (object)
                                (llm-provider-on-partial provider partial-callback object))))
                plz-media-types)))

(cl-defmethod llm-chat-streaming ((provider llm-standard-chat-provider) prompt partial-callback response-callback error-callback)
  (llm-request-plz (llm-chat-streaming-request provider prompt partial-callback response-callback error-callback)))

That's it for today. Talk to you tomorrow!

ahyatt commented 6 months ago

Thanks for the helpful messages. I think in all the cases you describe, open, error, message, the data is the thing important to the user, so should be just passed as-is to the callback in the event handler. It would be different if the various event handlers needed the other slots, such as last-event-id, origin, or type. But I don't think they do, which is why I think you should just pass the data value to the event handlers. I don't understand how this would be against the spec.

I worked a few hours today and got everything working with plz again and the new structure. Your proposal is interesting on a way to structure the calls. I read it after I had a different solution, though, and I'm not (yet) sure it's solving problem that aren't solved in a more straightforward way. But, it may come in useful, if I run into problems I'll see if this design can help. Thank you!

Looking at the main branch, it seems you would like the APIs of the text/event-stream, json-array, and ndjson to be more unified. Is that right? Since the event source is kind of well specified, maybe the others should follow a similar API as text/event-stream. So, instead of a single :handler function they also take a :handlers alist, and maybe also have "event types" like open, close, error etc. Would that help?

It might be nice to have things more unified, I don't think it's necessary - but the fact that no other callback was using a wrapper but event-handler was seemed wrong to me. It's currently fine with me if things remain separate, but I'd to resolve the event-handler callback issue we've be discussing.

r0man commented 6 months ago

Hi @ahyatt ,

I updated the PR with the following changes:

I noticed that the indentation in many files seems to be strange. I saw the body of a when from left to the actual when symbol. I didn't touch that.

I also updated my own provider to use the generic methods. I works very well and I could throw away a lot of code. Nice job!

I have run the following providers with the changes in this PR through the llm-tester:

It all works and I haven't noticed anything bad.

ahyatt commented 6 months ago

Thanks for the changes and the fixes! About the indentation, yes, something weird is going on and I don't know what yet. It appears I have tabs in my source despite turning off tabs in emacs. I'm still trying to figure out what's going on there.

Going back to the discussion of returning just the event data, do my arguments convince you, or have I missed something?

Also, about the timer change itself - although I merged this in, I'm curious if there is or is not something wrong in plz, since in the plz PR discussion, @alphapapa mentioned that the code is already using the timer technique. Do you understand the confusion?

r0man commented 6 months ago

Hi @ahyatt,

about the event data, I'm not really convinced. I think this is hiding data to a user of an event source. The SSE spec has this example here:

var source = new EventSource('updates.cgi');
source.onmessage = function (event) {
  alert(event.data);
};

And the handlers in the media type serve the same purpose. For a media type that is advertised as being for text/event-stream I think they should receive the same thing as the spec says. So, I would like to keep it that way. Is that a problem for you? I didn't see where this is causing issues yet.

About the timer, I think alphapapa just mentioned that plz does this as well, when I said that I discovered that jsonrpc.el is scheduling work to be done on the main loop from the process filter.

plz does this with run-at-time, scheduling the plz--respond function to be run on the main loop, once the process is finished. If the process filter does a sleep (which it shouldn't) and the process finishes before the filter wrote the response to the buffer, it might still crash trying to find the status code, headers etc. I'm not 100% sure about this, but that's my understanding. The process filter we are using is writing the output as the first step to the process buffer, and it does not sleep. So, I hope we are safe here.

The original issue was such a race condition. And it happened in rare cases. I was using a function from plz in the process filter that temporarily narrowed the body to parse the response. And sometimes the process finished earlier than the filter. In that case plz crashed because it was trying to find the HTTP status code in the buffer that got narrowed in the process filter.

I have now changed this code and we don't narrow anymore. Additionally the process filter doesn't run user code anymore, which was causing it to be terminating after the process in rare cases. Since those changes I haven't seen this happening anymore. I think its still possible that this can happen, but the timer is supposed to make this more unlikely.

I initially had some code in the PR that tried to call the plz--respond function only after the process filter calls were run, by having a kind of watermark counter and waiting for the filter calls to complete. alphapapa suggested to remove this.

ahyatt commented 6 months ago

I see your point. For me, adherence to the spec isn't particularly interesting, but having consistency between the various streaming handler classes is. So I think we still disagree, but I'm willing to live with the current state for now, especially because I want to finish this work.

AFAICT, finishing the work means landing the PR in plz, and renaming the modules that will remain in this library. I agree with @alphapapa that I'd prefer not using eieio if possible, to reduce dependencies and simplify the code. I probably should thoroughly review the code as well, since it will live for some time in the library. Have I got this right?

r0man commented 6 months ago

Yes, we need to get the PR merged. My paperwork is complete but I haven't heard back from Craig to send alphapapa the ok. I put you on CC on that mail. I will ping Craig again if he does not reply in the next days.

About EIEIO, do we really have to split hairs about this? We are already full in with the other half of the Common Lisp extensions. Both are a part of Emacs and how does removing it simplify anything here? It actually adds value by providing documentation and validation for the slots of the classes. Those classes are part of the public API and users are supposed to set these slots. I invested time in writing documentation for them, and defined the types so users can better understand how to set them.

I can open another PR tomorrow to rename the modules. Do you want just the renamed modules or the complete code including the testing infrastructure I have in my existing repositories? Please take a look at the repos to see what's there. If you want the tests as well, can we then also please setup automated tests in the llm repository?

ahyatt commented 6 months ago

About EIEIO, if the code will soon move out to it's own library, then I don't mind much. But if not, the cost is that I need to potentially understand EIEIO when investigating issues, and it's just more complication for me as a maintainer. I do know that you will help if there's anything wrong with these classes while they are part of the llm library, so as long as that's true, removing EIEIO is just a suggestion, not a requirement.

About your testing infratructure, yes, it probably makes sense to include (perhaps in a subdirectory as you do in your own packages).

alphapapa commented 6 months ago

Forgive the intrusion, but another consideration regarding EIEIO may be that of performance. If these functions are being called often and in quick succession, the overhead of EIEIO dispatch might be worth considering vs. simpler alternatives.

r0man commented 6 months ago

What about this alternative:

I ask on emacs-devel if they are interested in adding the 2 repositories to ELPA. Then anyone who wants to use those libraries can, those who don't want to, don't have to. I can develop them independently and I don't have to plz anyone's coding style or political interest. We are now at least 2 months into this and the most important issue hasn't been solved in a way Emacs users can benefit from easily:

Anyone wanting to make a streaming HTTP request today (let's take text/event-stream as an example) has to either:

Additionally, they have to write a parser for the text/event-stream format.

This is what I set out to solve after our initial discussion on the plz issue. Remember, we were talking about JSONRPC and all those formats. We now have a (at least to me satisfying) solution. The plz-media-type repository ships with a couple of batteries included classes that handle some commonly used formats. But users can also bring their own, plz-event-source being an example for this. Both streaming and non-streaming requests are supported.

What's now suggested is to bury the only solution that we have into the llm library and "hide" it from Emacs users. Why are we even doing this? Is this to the best interest of Emacs users?

Say, there is another project that wants to consume text/event-stream (I actually might have one soon, @josephmturner quickly showed up), should I depend on the llm library? Write my own process filter?

Yes, those 2 libraries might still not be perfect and have undiscovered issues. But this can be fixed. Nothing is perfect from the beginning, and it doesn't have to. If we put now all this code into the llm library, I don't see myself "moving the code out" again in the immediate future. I already separated things, so I don't see any benefit in combining the code again, renaming files, and then "moving it out" again. @ahyatt When do you think the "soon" you were talking of actually is? As I said previously, my time on this is ticking, and if the code does what it should, I want to move on to more interesting things.

So, if @alphapapa isn't going to add something similar to plz itself, releasing this to some Emacs package archive is the next best solution in my opinion. For @ahyatt to use it, I guess it should be GNU ELPA, right?

When asking on emacs-devel I can imagine some people asking to support url-retrieve as well. I briefly looked into this, but once I realized that the url library is replacing content in the response buffer with hooks I run away. I now know what @ahyatt meant when he said there is something strange with control characters going on. So, this library is probably tied to plz in the near term future, at least the request handling. I may make the code that parses the buffer more reusable, but probably not now.

If you have better ideas how Emacs users can benefit from this, I'm all ears.

I still think plz-media-type and plz-event-source are good names for the projects. I can also remove the plz- prefix if that helps. I'm also open to other naming suggestions.

I only think if emacs-devel doesn't want to add it to ELPA (which I doubt), renaming the files and copying the code to the llm library is actually a good solution.

ahyatt commented 6 months ago

I agree that asking on the emacs-devel mailing list is a good idea. You can just send a standard GNU ELPA inclusion proposal (see how others do it for an example), and explain the background. I've seen them also be skeptical of EIEIO, though. They tend to accept packages, but there's been debate before about plz and whether we should devote more energy to it versus standard url-retrieve and friends. IMHO, url-retrieve is unworkable, so if that happens, please loop me in.

As far as "soon", I was thinking in 6 months or less. But I agree that trying GNU ELPA is a good approach so that "soon" really is ASAP. Thanks for your patience!

alphapapa commented 5 months ago

@r0man

I would be glad to have plz support streaming responses in general. As I mentioned, I think a good place to start with the API might be to write a plz-stream function that could wrap plz and serve to develop the streaming API. Then, if it seemed appropriate, maybe that support could be folded back into the plz function itself.

I don't think it would be appropriate for the plz library to support event-stream directly, so that would probably best belong in a separate package.

As well, the EIEIO-based, content-type dispatching is more than I'd want to add to plz. That seems more application-specific than general.

But I think that support for both of those collections of functionality could be based on a plz-stream function or API. So I'm still open to the idea of merging a plz-stream API for supporting streaming responses in general (or the equivalent folded back into plz, as mentioned).

If you do want to publish plz-media-type and plz-event-source to ELPA, I'd suggest that their names not include plz, because it seems that their use of plz then becomes an implementation detail, one which you might want to change in the future (e.g. if Emacs were to gain a libcurl-based HTTP API, plz could become obsolete, but the code you wrote in those libraries might just need minor adjustments to use the non-plz API).

Also, if you were to publish those on ELPA, would you be intending to maintain them in the long-term? You've mentioned that your "time is ticking," and you "want to move on to more interesting things," so I'm curious about your plans for that. I can't say that there's an obligation to keep maintaining a package published to ELPA, but in general, it's preferable that packages have active maintainers, and it's good to be clear about intentions.

One other comment:

What's now suggested is to bury the only solution that we have into the llm library and "hide" it from Emacs users. Why are we even doing this? Is this to the best interest of Emacs users?

Please don't misunderstand me: the intention is not to bury or hide anything. The intention is (or would be) to allow code to mature and stabilize before publishing it as a library intended to be supported and used more widely in the long term. As I mentioned, plz itself was in development for about 3 years before being published to ELPA, and then took a bit more development before it reached its current state, which I'm comfortable calling mature. As you've seen, this kind of code tends to come with unforeseen issues which only reveal themselves in time, from use "in anger." It's often better, for both the users and developers, if the initial usage of a library comes from the initial application it's developed for, as it makes changes easier for the developer, and reduces the burden on users to keep up with them.

r0man commented 5 months ago

@alphapapa I don't have the time to continue working on something plz-stream related. plz-media-type-request was my attempt to solve what we initially discussed. I understand that you don't want to add much more code to plz, and that is fine.

About my intentions: My intentions are to support @ahyatt with the migration to use curl for making HTTP requests in the llm library. I don't have plans to make plz-media-type or plz-event-source compatible with a future libcurl based alternative, nor with url-retrieve. The libraries are designed to work with plz and that's it. For that reason I would also keep the plz- prefix, so users can easily find it. I recently saw there is https://github.com/astoff/plz-see.el, so I think it would make sense to keep that prefix for my libraries as well, since it is tied to plz.

@ahyatt what about the other approach I mentioned. I continue work in my repositories and make sure it works with the llm library. We "vendor" plz-media-type and plz-event-source by copying them over from time to time and distribute the files for now with the llm library. If nothing serious comes up, and they were used "in anger" we submit the plz-media-type and plz-event-source libraries to ELPA in a couple of months. I make sure nothing breaks, should I change or improve something, will open PRs to the llm library and test all of this thoroughly. What I'm less keen on doing is to move all those files into the llm library, rename them, just to tear them apart, and rename them again once we publish this to ELPA.

ahyatt commented 5 months ago

@r0man, keeping the files in my library temporarily sounds reasonable, but please add comments to that effect in the files themselves, so people know these will soon move out to their own repository. Let's have a goal to move them out no later than 6 months from now (also note this in the files). Thanks!

r0man commented 5 months ago

@ahyatt I pushed a change to the plz branch/PR with a note about the files being vendor-ed for now, and when we plan to submit them to ELPA.