Open coryfoo opened 1 month ago
The documentation for image and audio uploads seems to suggest that it does not support one-shot model prompts that include the image data embedded within the content as other models do, but rather you must upload a file and refer to that upload in the prompt. Because of that, I wasn't sure of the best approach to take to add support for that paradigm and would be happy to chat about potential implementation strategies @dbrewster @LukeLalor ?
The documentation for image and audio uploads seems to suggest that it does not support one-shot model prompts that include the image data embedded within the content as other models do, but rather you must upload a file and refer to that upload in the prompt. Because of that, I wasn't sure of the best approach to take to add support for that paradigm and would be happy to chat about potential implementation strategies @dbrewster @LukeLalor ?
What is their api? Uploading the docs would be fine, but we would obviously need to keep track of what has been uploaded. This is fine, but obviously just another piece of complexity floating around. We could also have a V1 impl that does not support multi-media natively are relies on the image processors (this is how we do multi-media support for text-only models). Either option would be perfectly reasonable, but I would probably implement the 2nd and hold off on #1 until we have customers clamoring for it. I am on vacation this week, but will ping @dbrewster to see if he has thoughts.
Adds basic support for the Google Gemini LLM (
gemini-1.5-flash
). Does not include Gemini tool support for multi-modal interactions using Gemini-based models (ie, no speech, pictures, etc).I've taken the code from the existing branch and updated it so that the protobuf serialization works, specifically when declaring tool support. You are able to interact with a chat-only version of this implementation which works fine.
The tool calling portions are untested (could use some help with setting this up locally!).