Closed r0man closed 4 months ago
About the plz process, sure, that sounds fine. The only reason I would have needed the process at all is to ensure that the event-stream notifications are always in the process buffer; but I think you are ensuring that in the event stream code.
I like the idea of a streaming array parser, thanks for writing it! My current opinion is that we just have a new function llm-request-plz-streaming-array
to implement this. This should pass the elements of the array to a callback :on-array-element
(or something like that), as parsed JSON. This way it follows the same pattern that we have with the llm-plz-request-event-stream
function I added today. Does that sound reasonable?
Hi @ahyatt,
ok, good that you don't need the process buffer. The existing code should take care of making sure all events are processed when the connection closes. The llm-request-plz-streaming-array functions sounds like a plan, so let's do this then.
About the OpenAI provider: In the plz branch I see only the llm-chat-streaming
function using curl at the moment. Are you planning to change the other functions to use curl as well? This is what I would need for my custom provider at work. My plan is to basically make a provider that inherits from llm-openai-compatible
, override all llm-chat-xxx
cl-defmethods to let bind the plz-curl-default-arg
to add a --key
and --cert
option for mutual TLS, and call the next applicable method. I think this would work, and I could remove most of the curl code I have there right now.
Some of my collegues use GPTel, and together with karthink we added this to GPTel: https://github.com/karthink/gptel/pull/221 It allows every provider to customize curl options and they get set by GPTel when doing HTTP requests. This or something similar would be my ideal solution, so I could "just" customize a llm-openai-compatible
provider by tweaking the URL and the curl options. Not sure if you are interested in having something likes this, once we settled for curl? The overriding of the cl-defmethods would also work fine I think.
@r0man just to be sure, can you confirm that the existing code ensures that all callbacks are executed from the process buffer?
About Open AI, for some reason I thought I converted everything, but evidently not! Thanks for noticing that. I just pushed the changes to enable the complete conversion.
About changing curl args as a way to customize, I'm wondering if it's possible to add a way to tweak the plz settings instead. This feels like breaking encapsulation: the caller to plz and friends should have no idea that curl is what is being used. That said, if it's the only way to do something, it's probably fine as long as we limit it carefully and warn users that changes in the plz library might break this.
Hi @ahyatt ,
yes, all callbacks should be called with the current buffer being the curl process buffer. This is the case in the :then and :else callbacks (because I am using the :as buffer option), and in the process filter as well.
Thanks for changing the OpenAI provider. I will give it a try.
What do you mean by tweaking the plz settings? There is a public customization variable called plz-curl-default-args
. But for mutual TLS you don't really want to change this globally, since this setting is per request, and depends on the host you connect to. As far as I know this seems the only option to teach plz/curl anything about mutual TLS. GNUTLS can read certificates and keys from ~/.authinfo and friends, but does not like my certificates and/or private key.
The curl options aren't too important. I will try overriding the defmethods next.
What I mean is that the options should ideally be plz
options you make on making the request, which then can be translated to curl
options by plz
. Something like a :tls-cert
option on the plz
request, etc.
Ok, I see. Thanks for the explanation!
Hi @ahyatt,
I converted my custom provider to use the llm-openai-compatible provide using curl. While doing so I found 2 bugs for which I pushed 2 commits:
With those 2 changes I could use Ellama.
There seems to be a problem with encoding however. plz sets some coding system which I haven done yet. I'm going to look at this next.
The issue I see is this:
These default coding systems were tried to encode the following
problematic characters in the buffer ‘ellama hello (Open AI).org’:
Coding System Pos Codepoint Char
undecided-unix 8745 #x1F338 🌸
8760 #x3FFFF0
8761 #x3FFF9F
8762 #x3FFF8C
8763 #x3FFF99
...
utf-8-unix 8760 #x3FFFF0
8761 #x3FFF9F
8762 #x3FFF8C
8763 #x3FFF99
8799 #x3FFFE2
...
However, each of them encountered characters it couldn’t encode:
undecided-unix cannot encode these: 🌸 ...
utf-8-unix cannot encode these: ...
Click on a character (or switch to this window by ‘C-x o’
and select the characters by RET) to jump to the place it appears,
where ‘C-u C-x =’ will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
to remove or modify the problematic characters,
or specify any other coding system (and risk losing
the problematic characters).
raw-text no-conversion
Thank you for the bug fixes! For the coding issue, I think I've seen some issues like that before, but I can't remember the solution. You might want to try making sure to make sure you match what plz is doing with making everything (or maybe nothing) multibyte.
BTW, let me know when the code is ready for me to try to integrate Gemini.
Hi @ahyatt ,
I pushed another change. The on-error callbacks for all providers is supposed to be called with 2 arguments. We were calling it with one argument, the plz-error object. I explained the 2 cases we have with plz (HTTP error/curl error) in the commit message.
Yes, I will let you know. Right now, I'm only aware of the encoding issue, and I'm working on it.
Hi @ahyatt,
I pushed a fix for the encoding issue. I think it is working now. If I ask the AI to generate "interesting" unicode characters it does generate some emojis and those are now displayed properly. At least I believe. :) The popup message about the problem I saw earlier does not appear anymore. So I think it is working.
I would say give it a try!
I changed the media type class a bit. There are now type, subtype and parameters slots. And their values are symbols instead of strings. I changed one occurrence of this in the llm-request-plz module.
Thanks! Let me merge this pull request, and for the further changes to create JSON array stream media type, that can be a next pull request.
Hi @ahyatt ,
thanks for merging this. I think we already have 2 options to continue the migration:
You can find usage examples in the tests here: https://github.com/r0man/plz.el/blob/plz-media-type/tests/test-plz-media-type.el
Please let me know what else we are missing.
Thank you for the pointers! Let me try to port over Vertex / Gemini with this today. Do you plan to take on Ollama, or should I also do that one?
Hi @ahyatt , if you could do Ollama as well, that would be great.
Sure, I'll try Ollama.
On the Gemini / Vertex side, I've added a commit that should address it, but doesn't. The problem is that the content is streamed with a content type of application/json
, so it doesn't match the application/json-array
media type. I could change things so that we match by the first prefix in the media types alist instead of looking for an exact match. But maybe there's some other way you want to handle this.
I've checked in Ollama, which works well, thanks!
One confusion I had when writing this is that the treatment of callbacks from plz
seems arbitrary - do I expect handlers to get plz-response
objects, or the response body?
The :then and :else callbacks in plz-media-type are always called with:
So, this should be consistent.
For the media specific handlers, the arguments are different for each media type. The media type that handles the text/event-stream content type uses the event source and an event from that specification as arguments. The events are specific to that media type.
For the other media types it is similar. I did not try to unify anything here.
I answered your question about the application/json-array medai type over here: https://github.com/ahyatt/llm/commit/069bd193efa094ee93cfadf51f0b81b1d5a9b98b#r139986493
Nice that you got Ollama working!
Thanks, your suggestion fixed the issue. However, it seems like there's issues parsing array elements out. I debugged into it and we basically get the first element, then all the other elements in the buffer to plz-media-type--parse-json-object
, but in that function we just parse out the next element, instead of going to all the next elements of the array and parsing them as well. Can you look into this?
Hi @ahyatt, ok, I will take a look at it. If you have suggestion of making the API less confusing, I'm all ears!
Hi @ahyatt,
I have 2 improvements
This removes the process slot from the plz response struct which I added initially. We don't need it when using the :as 'buffer option when calling out to plz. We now only depend on one additional change in plz, which is setting the process filter via an option.
This adds the plz-media-type:application/json-array media type which can be used to stream a JSON array as used by Vertex/Gemini. It's a poor mans streaming JSON parser. You can register a handler that gets called for each object in the outer most JSON array of the response. You can use it in a similar way as the event stream handler.
See the tests in [1] for more examples.
Unfortunatly with Gemini/Vertex wanting a streaming parser for application/json we need to distinguish now 2 cases:
In order to use this, I would suggest to pass the media types via a :media-types option to the following functions:
We could use the plz-media-types as the default, but for any media type that needs to register event handlers for a streaming media type, I think it's best if we wire the list of media types (and the callbacks) together in each provider.
Wdyt?
[1] https://github.com/r0man/plz.el/blob/plz-media-type/tests/test-plz-media-type.el#L227-L256