Add support for Google AI / Gemini Pro model

jadengis commented 6 months ago

Summary

This PR adds a ChatGoogleAI model that wraps interactions with the Google AI Rest APIs for the purposes of integrating with langchain, thus closing #6.

This change supports the full set of Gemini Pro features, including non-streamed responses, streamed responses and function calling.

Details

Differences with OpenAI

Google uses the "model" role instead of "assistant"
Streaming chunks have an identical schema to single-shot messages. This allowed a lot of reuse of the message parsing code for both Messages and MessageDeltas.
I implemented the for_api/ behaviour using plain old pattern matching. The protocol appoach seems a little roundabout.

Quirks

Google AI doesn't support system prompts at the moment. I asked in the Google Discord server and was told this is a known issue and big ask, but the best thing to do in the meantime is simulate a system message with a user message. This would be great, but the API validates that all user messages in a multi-turn conversation are followed by model message. The way around this is to expand system messages into a user message and shim model message. This is non-ideal and hopefully system prompt support is added soon.
Streaming text chunks in the SSE format requires the alt=sse query param added to the url. This is undocumented, but i noticed it being used in the official SDKs.
Google users finishReason: "STOP" for basically everything, including message deltas. This doesn't jive well with some of the existing logic for tracking when e.g. streaming deltas completes. This behaviour is faked in the ChatGoogleAI module.

arbaaz commented 6 months ago

Nice!!

brainlid commented 6 months ago

@jadengis from what I've seen in the code so far, it appears that the Google AI server only returns a single version of the assistant's message. For instance, the OpenAI API has the n parameter for the number of output versions the server should generate. Think like running the same step in the conversation with different seeds and seeing how differently it generates multiple versions.

Anyway, I've never seen another model do that and it complicates the return types. From what you've seen of the Google API, does it have that capability?

I'm considering removing support for that and cleaning up the return type for ChatOpenAI.call

  @type call_response :: {:ok, Message.t() | [Message.t()]} | {:error, String.t()}

It would just be {:ok, Message.t()} instead of an optional array of messages.

What are your thoughts?

jadengis commented 6 months ago

@brainlid I think Google AI API actually does support returning multiple versions of the message. The response JSON for the generation method API contained in the docs contains a candidates array which I believe contains should contain all the versions that would be generated. By default it seems to generate only 1 version.

The option for setting this seems undocumented however. It doesn't appear in the model parameters documation, but I did find a candidatesCount option in the official JavaScript SDK, so I think it should work. That is, sending a request like

{
  "contents": [
    "parts": [{ "text": "User message"}]
  ] ,
  "generationConfig": {
    "candidatesCount": 2
  }
}

should return 2 candidates in the response.

I personally don't have a use case for returning multiple versions of a message, but in the interest of keeping things flexible, and since there are two big LLMs that support it, it probably makes sense to leave the type as is. I can think of use cases where being able to generate multiple candidates would be useful

jadengis commented 5 months ago

@brainlid Hey is there anyway I can help to get this PR into main? Willing to pitch in if there is any preliminary work required. :pray:

medoror commented 5 months ago

@brainlid interested on your thoughts here! I have been looking into integrating ollama chat and if this PR is merged, it creates a nice seam for me to start.

Do you have any hesitations on implementation?

brainlid commented 5 months ago

@medoror: Have you used this PR? Have you tested with it any level?

brainlid commented 5 months ago

@jadengis I'm not currently setup for testing/verifying the Google endpoint. However, if you're able to help support/fix issues with the integration, then I'm okay to merge it in.

jadengis commented 5 months ago

@brainlid I'm using the Google engine currently in an application, so I've got no problem supporting / fixing issues with the integration. I'll more likely than not need those fixes anyway. As written, it's been working in production without issue for 3 - 4 weeks.

There are a few merge conflicts it looks like. Are there any big changes I should pay attention to in resolving these conflicts? :pray:

brainlid commented 5 months ago

@jadengis Sounds good! The merge conflicts should be pretty clean. Can you merge them into your branch?

jadengis commented 5 months ago

@brainlid I've updated the PR to be inline with the current main. Updated the Google model to use the same into Req trick that you added to OpenAI for streaming. Commoned out the chunk processing code. That's pretty much all the changes. Let me know if it looks good to you :pray:

brainlid commented 5 months ago

@jadengis Thanks for all the work you've put into this!

❤️💛💙💜

brainlid / langchain