google-gemini / cookbook

Examples and guides for using the Gemini API.
https://ai.google.dev/gemini-api/docs
Apache License 2.0
4.65k stars 652 forks source link

How to pass PDFs to Gemini using the Python SDK? #29

Closed NielsRogge closed 4 months ago

NielsRogge commented 4 months ago

Hi,

The Gemini Pro Vision and Gemini Pro 1.5 models look very performant from trying them out in Google AI Studio. However, the docs regarding API/Python SDK usage are pretty convoluted, hard to navigate through, and there's no guide on how to pass PDFs to Gemini.

What I could find is this:

NielsRogge commented 4 months ago

Looks like this is going to be addressed in #17 as the File API is still in preview

NielsRogge commented 4 months ago

Another thing that is confusing is that some guides are doing this:

!pip install -U -q google.generativeai

import google.generativeai as genai

model = genai.GenerativeModel('models/gemini-pro')

whereas other guides are using this:

!pip install --U -q google-cloud-aiplatform

from vertexai.generative_models import GenerativeModel

multimodal_model = GenerativeModel("gemini-1.0-pro-vision")

Which one is recommended? Why are there 2 Python SDKs?

markmcd commented 4 months ago

There are 2 SDKs because there are 2 platforms that host the Gemini API. One for Google Cloud Platform customers (minimal setup when running in Google Cloud), and one that does not require a GCP account (API key auth).

Platform Docs SDK
Gemini API https://ai.google.dev/ google-generativeai
Vertex AI https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview google-cloud-aiplatform

This guide is meant to help disambiguate the two. We try to keep the API surfaces aligned, but some parts are different by design - e.g. authentication.

If you want to leave feedback for the docs you linked, you can use the "Send feedback" links on the respective pages.

NielsRogge commented 4 months ago

Ok, I've been trying out setting up an API key by following the authentication notebook.

However, Google AI Studio is not available in my region (Belgium). Does it mean I cannot use the Gemini API?

edemir206 commented 4 months ago

@NielsRogge did find how to send PDF files to gemini api ? I'm struggling with this too, I want to send a PDF file via api for Gemini to summarize. Is it possible @markmcd ?

NielsRogge commented 4 months ago

Hi @edemir206,

here's the code one can use for that:

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# variables for you to fill in
PROJECT_ID = ""
LOCATION = ""
BUCKET_NAME = ""

# initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=LOCATION)

model = GenerativeModel("gemini-1.5-pro-preview-0409")

# get the GCS path of the file
gcs_path = f"gs://{BUCKET_NAME}/{filename}"

prompt = "Describe the file"

response = model.generate_content(
            [
                Part.from_uri(gcs_path, mime_type="application/pdf"),
                prompt
            ]
)

The code can be run if you're authenticated to your Google Cloud project (by running gcloud auth login).

edemir206 commented 4 months ago

Hi @edemir206,

here's the code one can use for that:

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# variables for you to fill in
PROJECT_ID = ""
LOCATION = ""
BUCKET_NAME = ""

# initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=LOCATION)

model = GenerativeModel("gemini-1.5-pro-preview-0409")

# get the GCS path of the file
gcs_path = f"gs://{BUCKET_NAME}/{filename}"

prompt = "Describe the file"

response = model.generate_content(
            [
                Part.from_uri(gcs_path, mime_type="application/pdf"),
                prompt
            ]
)

The code can be run if you're authenticated to your Google Cloud project (by running gcloud auth login).

I'm using Google Gemini API not Vertex API, i'm still a little bit confused but I think for my needs Gemini Api Pricing is much lower.

I tried using inlineData for PDF embed with base64 encoded PDF but I get error 500, my json is formatted like:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Please summarize"
        },
        {
          "inlineData": {
            "mimeType": "application/pdf",
            "data": base64EncodedfileData
          }
        }
      ]
    }
  ],
  "generationConfig": {
    "temperature": 0.9,
    "topK": 1,
    "topP": 1,
    "maxOutputTokens": 2048,
    "stopSequences": []
  },
  "safetySettings": [
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    }
  ]
}

Is it not possible via Gemini API ?

NielsRogge commented 4 months ago

When you use Gemini's API you can follow the guide here: https://ai.google.dev/gemini-api/docs/prompting_with_media

edemir206 commented 4 months ago

When you use Gemini's API you can follow the guide here: https://ai.google.dev/gemini-api/docs/prompting_with_media

I could upload the file using the sample but it fails with the error "google.api_core.exceptions.InvalidArgument: 400 Unsupported MIME type: application/pdf"

Here's my code:

import google.generativeai as genai
from IPython.display import Markdown

GOOGLE_API_KEY=""

genai.configure(api_key=GOOGLE_API_KEY)

sample_file = genai.upload_file(path="/home/user/python/test.pdf",
                                display_name="Sample PDF")

print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")

model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

response = model.generate_content(["Please Summarize PDF.", sample_file])

Markdown(">" + response.text)

genai.delete_file(sample_file.name)
print(f'Deleted {sample_file.display_name}.')