Strip plz changes and add JSON array stream media type

r0man commented 4 months ago

Hi @ahyatt,

I have 2 improvements

This removes the process slot from the plz response struct which I added initially. We don't need it when using the :as 'buffer option when calling out to plz. We now only depend on one additional change in plz, which is setting the process filter via an option.
This adds the plz-media-type:application/json-array media type which can be used to stream a JSON array as used by Vertex/Gemini. It's a poor mans streaming JSON parser. You can register a handler that gets called for each object in the outer most JSON array of the response. You can use it in a similar way as the event stream handler.

(let* ((api-key "my-key")
       (project "my-project"))
  (plz-media-type-request
    'post (format
           (concat "https://us-central1-aiplatform.googleapis.com"
                   "/v1/projects/%s/locations"
                   "/us-central1/publishers/google/models"
                   "/gemini-1.0-pro:streamGenerateContent")
           project)
    :as `(media-types (("application/json"
                        . ,(plz-media-type:application/json-array
                            :handler (lambda (object)
                                       (message "Object: %s" object))))))
    :body (json-encode
           `((contents .
                       [((role . "user")
                         (parts .
                                [((text . "Hello"))]))])
             (generation_config
              (maxOutputTokens . 2048)
              (temperature . 1)
              (topP . 0.4))
             (safetySettings .
                             [((category . "HARM_CATEGORY_HATE_SPEECH")
                               (threshold . "BLOCK_MEDIUM_AND_ABOVE"))
                              ((category . "HARM_CATEGORY_DANGEROUS_CONTENT")
                               (threshold . "BLOCK_MEDIUM_AND_ABOVE"))
                              ((category . "HARM_CATEGORY_SEXUALLY_EXPLICIT")
                               (threshold . "BLOCK_MEDIUM_AND_ABOVE"))
                              ((category . "HARM_CATEGORY_HARASSMENT")
                               (threshold . "BLOCK_MEDIUM_AND_ABOVE"))])))
    :headers `(("Authorization" . ,(format "Bearer %s" api-key))
               ("Content-Type" . "application/json"))
    :then (lambda (response)
            (message "Done."))))

See the tests in [1] for more examples.

Unfortunatly with Gemini/Vertex wanting a streaming parser for application/json we need to distinguish now 2 cases:

Handle application/json in streaming way
Handle application/json in non-streaming way

In order to use this, I would suggest to pass the media types via a :media-types option to the following functions:

llm-request-plz-event-stream
llm-request-plz-sync
llm-request-plz-async

We could use the plz-media-types as the default, but for any media type that needs to register event handlers for a streaming media type, I think it's best if we wire the list of media types (and the callbacks) together in each provider.

Wdyt?

[1] https://github.com/r0man/plz.el/blob/plz-media-type/tests/test-plz-media-type.el#L227-L256

ahyatt commented 4 months ago

About the plz process, sure, that sounds fine. The only reason I would have needed the process at all is to ensure that the event-stream notifications are always in the process buffer; but I think you are ensuring that in the event stream code.

I like the idea of a streaming array parser, thanks for writing it! My current opinion is that we just have a new function llm-request-plz-streaming-array to implement this. This should pass the elements of the array to a callback :on-array-element (or something like that), as parsed JSON. This way it follows the same pattern that we have with the llm-plz-request-event-stream function I added today. Does that sound reasonable?

r0man commented 4 months ago

Hi @ahyatt,

ok, good that you don't need the process buffer. The existing code should take care of making sure all events are processed when the connection closes. The llm-request-plz-streaming-array functions sounds like a plan, so let's do this then.

About the OpenAI provider: In the plz branch I see only the llm-chat-streaming function using curl at the moment. Are you planning to change the other functions to use curl as well? This is what I would need for my custom provider at work. My plan is to basically make a provider that inherits from llm-openai-compatible, override all llm-chat-xxx cl-defmethods to let bind the plz-curl-default-arg to add a --key and --cert option for mutual TLS, and call the next applicable method. I think this would work, and I could remove most of the curl code I have there right now.

Some of my collegues use GPTel, and together with karthink we added this to GPTel: https://github.com/karthink/gptel/pull/221 It allows every provider to customize curl options and they get set by GPTel when doing HTTP requests. This or something similar would be my ideal solution, so I could "just" customize a llm-openai-compatible provider by tweaking the URL and the curl options. Not sure if you are interested in having something likes this, once we settled for curl? The overriding of the cl-defmethods would also work fine I think.

ahyatt commented 4 months ago

@r0man just to be sure, can you confirm that the existing code ensures that all callbacks are executed from the process buffer?

About Open AI, for some reason I thought I converted everything, but evidently not! Thanks for noticing that. I just pushed the changes to enable the complete conversion.

About changing curl args as a way to customize, I'm wondering if it's possible to add a way to tweak the plz settings instead. This feels like breaking encapsulation: the caller to plz and friends should have no idea that curl is what is being used. That said, if it's the only way to do something, it's probably fine as long as we limit it carefully and warn users that changes in the plz library might break this.

r0man commented 4 months ago

Hi @ahyatt ,

yes, all callbacks should be called with the current buffer being the curl process buffer. This is the case in the :then and :else callbacks (because I am using the :as buffer option), and in the process filter as well.

Thanks for changing the OpenAI provider. I will give it a try.

What do you mean by tweaking the plz settings? There is a public customization variable called plz-curl-default-args. But for mutual TLS you don't really want to change this globally, since this setting is per request, and depends on the host you connect to. As far as I know this seems the only option to teach plz/curl anything about mutual TLS. GNUTLS can read certificates and keys from ~/.authinfo and friends, but does not like my certificates and/or private key.

r0man commented 4 months ago

The curl options aren't too important. I will try overriding the defmethods next.

ahyatt commented 4 months ago

What I mean is that the options should ideally be plz options you make on making the request, which then can be translated to curl options by plz. Something like a :tls-cert option on the plz request, etc.

r0man commented 4 months ago

Ok, I see. Thanks for the explanation!

r0man commented 4 months ago

Hi @ahyatt,

I converted my custom provider to use the llm-openai-compatible provide using curl. While doing so I found 2 bugs for which I pushed 2 commits:

The first one was an issue with accessing a zero length vector in the llm-openai provider
The 2nd was a regression I introduced in the event source process filter that messed up the parser

With those 2 changes I could use Ellama.

There seems to be a problem with encoding however. plz sets some coding system which I haven done yet. I'm going to look at this next.

The issue I see is this:

These default coding systems were tried to encode the following
problematic characters in the buffer ‘ellama hello (Open AI).org’:
  Coding System           Pos  Codepoint  Char
  undecided-unix         8745  #x1F338    🌸
                         8760  #x3FFFF0   
                         8761  #x3FFF9F   
                         8762  #x3FFF8C   
                         8763  #x3FFF99   
                          ...             
  utf-8-unix             8760  #x3FFFF0   
                         8761  #x3FFF9F   
                         8762  #x3FFF8C   
                         8763  #x3FFF99   
                         8799  #x3FFFE2   
                          ...             

However, each of them encountered characters it couldn’t encode:
  undecided-unix cannot encode these: 🌸          ...
  utf-8-unix cannot encode these:           ...

Click on a character (or switch to this window by ‘C-x o’
and select the characters by RET) to jump to the place it appears,
where ‘C-u C-x =’ will give information about it.

Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
   to remove or modify the problematic characters,
or specify any other coding system (and risk losing
   the problematic characters).

  raw-text no-conversion

ahyatt commented 4 months ago

Thank you for the bug fixes! For the coding issue, I think I've seen some issues like that before, but I can't remember the solution. You might want to try making sure to make sure you match what plz is doing with making everything (or maybe nothing) multibyte.

BTW, let me know when the code is ready for me to try to integrate Gemini.

r0man commented 4 months ago

Hi @ahyatt ,

I pushed another change. The on-error callbacks for all providers is supposed to be called with 2 arguments. We were calling it with one argument, the plz-error object. I explained the 2 cases we have with plz (HTTP error/curl error) in the commit message.

Yes, I will let you know. Right now, I'm only aware of the encoding issue, and I'm working on it.

r0man commented 4 months ago

Hi @ahyatt,

I pushed a fix for the encoding issue. I think it is working now. If I ask the AI to generate "interesting" unicode characters it does generate some emojis and those are now displayed properly. At least I believe. :) The popup message about the problem I saw earlier does not appear anymore. So I think it is working.

I would say give it a try!

I changed the media type class a bit. There are now type, subtype and parameters slots. And their values are symbols instead of strings. I changed one occurrence of this in the llm-request-plz module.

ahyatt commented 4 months ago

Thanks! Let me merge this pull request, and for the further changes to create JSON array stream media type, that can be a next pull request.

r0man commented 4 months ago

Hi @ahyatt ,

thanks for merging this. I think we already have 2 options to continue the migration:

There is this poor mans streaming JSON parser, that you could use for Vertex/Gemini. It is here: https://github.com/ahyatt/llm/blob/plz/plz-media-type.el#L235-L281
There is the streaming parser for newline delimited JSON which I think can be used for Ollama. This one is here: https://github.com/ahyatt/llm/blob/plz/plz-media-type.el#L283-L324

You can find usage examples in the tests here: https://github.com/r0man/plz.el/blob/plz-media-type/tests/test-plz-media-type.el

Please let me know what else we are missing.

ahyatt commented 3 months ago

Thank you for the pointers! Let me try to port over Vertex / Gemini with this today. Do you plan to take on Ollama, or should I also do that one?

r0man commented 3 months ago

Hi @ahyatt , if you could do Ollama as well, that would be great.

ahyatt commented 3 months ago

Sure, I'll try Ollama.

On the Gemini / Vertex side, I've added a commit that should address it, but doesn't. The problem is that the content is streamed with a content type of application/json, so it doesn't match the application/json-array media type. I could change things so that we match by the first prefix in the media types alist instead of looking for an exact match. But maybe there's some other way you want to handle this.

ahyatt commented 3 months ago

I've checked in Ollama, which works well, thanks!

One confusion I had when writing this is that the treatment of callbacks from plz seems arbitrary - do I expect handlers to get plz-response objects, or the response body?

r0man commented 3 months ago

The :then and :else callbacks in plz-media-type are always called with:

a plz-response struct for the :then callback
a plz-error struct for the :else callback

So, this should be consistent.

For the media specific handlers, the arguments are different for each media type. The media type that handles the text/event-stream content type uses the event source and an event from that specification as arguments. The events are specific to that media type.

For the other media types it is similar. I did not try to unify anything here.

r0man commented 3 months ago

I answered your question about the application/json-array medai type over here: https://github.com/ahyatt/llm/commit/069bd193efa094ee93cfadf51f0b81b1d5a9b98b#r139986493

r0man commented 3 months ago

Nice that you got Ollama working!

ahyatt commented 3 months ago

Thanks, your suggestion fixed the issue. However, it seems like there's issues parsing array elements out. I debugged into it and we basically get the first element, then all the other elements in the buffer to plz-media-type--parse-json-object, but in that function we just parse out the next element, instead of going to all the next elements of the array and parsing them as well. Can you look into this?

r0man commented 3 months ago

Hi @ahyatt, ok, I will take a look at it. If you have suggestion of making the API less confusing, I'm all ears!

ahyatt / llm

Strip plz changes and add JSON array stream media type #29