google-gemini / generative-ai-swift

The official Swift library for the Google Gemini API
https://ai.google.dev/gemini-api/docs/get-started/tutorial?lang=swift
Apache License 2.0
903 stars 141 forks source link

Support prompting with media files #201

Open longseespace opened 1 month ago

longseespace commented 1 month ago

Description of the feature request:

The Gemini API supports uploading media files separately from the prompt input, allowing your media to be reused across multiple requests and multiple prompts.

https://ai.google.dev/gemini-api/docs/prompting_with_media?lang=python https://ai.google.dev/api/files

What problem are you trying to solve with this feature?

Add the ability to prompt a document from a client

Any other information you'd like to share?

No response

andrewheard commented 1 month ago

Hi @longseespace, it's possible to use media files that have already been uploaded with the server-side SDKs (Python, Go, Node.js) or REST APIs using fileData in the Swift SDK, e.g.:

let content = try await model.generateContent(
  ModelContent.Part.fileData(
    mimetype: "image/jpeg",
    uri: "https://generativelanguage.googleapis.com/v1beta/files/some-hash"
  ),
  "What is in this image?"
)

Unfortunately, based on our current engineering plan and product backlog, there is no plan to support uploading files using the Swift SDK in the near term. As a potential alternative, the similar product Vertex AI for Firebase SDK supports media uploaded with the Cloud Storage for Firebase SDK. This guide shows how to use the two SDKs together: https://firebase.google.com/docs/vertex-ai/solutions/cloud-storage