GoogleCloudPlatform / generative-ai

Sample code and notebooks for Generative AI on Google Cloud, with Gemini on Vertex AI
https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview
Apache License 2.0
6.27k stars 1.61k forks source link

[Bug]: FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG #676

Open myoshimu opened 1 month ago

myoshimu commented 1 month ago

File Name

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb

What happened?

Following code failed with "FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG" error:

#Extract text and image metadata from the PDF document
text_metadata_df, image_metadata_df = get_document_metadata(
    multimodal_model,  # we are passing gemini 1.0 pro vision model
    pdf_folder_path,
    image_save_dir="images",
    image_description_prompt=image_description_prompt,
    embedding_size=1408,
)

print("\n\n --- Completed processing. ---")
:

Processing page: 1
Processing page: 2
Processing page: 3
Processing page: 4

:

FzErrorArgument                           Traceback (most recent call last)
[<ipython-input-8-96bfa690e8cb>](https://localhost:8080/#) in <cell line: 14>()
     12 
     13 # Extract text and image metadata from the PDF document
---> 14 text_metadata_df, image_metadata_df = get_document_metadata(
     15     multimodal_model,  # we are passing gemini 1.0 pro vision model
     16     pdf_folder_path,

4 frames
~/.local/lib/python3.10/site-packages/pymupdf/mupdf.py in fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47578         Write a pixmap as a JPEG.
  47579     """
> 47580     return _mupdf.fz_write_pixmap_as_jpeg(out, pix, quality, invert_cmyk)
  47581 
  47582 def fz_write_pixmap_as_jpx(out, pix, quality):

FzErrorArgument: code=4: pixmap must be Grayscale, RGB, or CMYK to save as JPEG

Relevant log output

I think get_image_for_gemini() function in gemini/use-cases/retrieval-augmented-generation/utils/intro_multimodal_rag_utils.py should be modified as below:

import fitz
import os
from PIL import Image

def get_image_for_gemini(
    doc: fitz.Document,
    image: tuple,
    image_no: int,
    image_save_dir: str,
    file_name: str,
    page_num: int,
) -> Tuple[Image, str]:
    """
    Extracts an image from a PDF document, converts it to JPEG format, saves it to a specified directory,
    and loads it as a PIL Image Object.

    Parameters:
    - doc (fitz.Document): The PDF document from which the image is extracted.
    - image (tuple): A tuple containing image information.
    - image_no (int): The image number for naming purposes.
    - image_save_dir (str): The directory where the image will be saved.
    - file_name (str): The base name for the image file.
    - page_num (int): The page number from which the image is extracted.

    Returns:
    - Tuple[Image.Image, str]: A tuple containing the Gemini Image object and the image filename.
    """

    # Extract the image from the document
    xref = image[0]
    pix = fitz.Pixmap(doc, xref)

    # Convert the image to JPEG format
    pix.tobytes("jpeg")

    # Create the image file name
    image_name = f"{image_save_dir}/{file_name}_image_{page_num}_{image_no}_{xref}.jpeg"

    # Create the image save directory if it doesn't exist
    os.makedirs(image_save_dir, exist_ok=True)

    # Save the image to the specified location
    pix.save(image_name)

    # Load the saved image as a Gemini Image Object
    image_for_gemini = Image.load_from_file(image_name)

    return image_for_gemini, image_name

Code of Conduct

krupalsmart97 commented 1 month ago

Hey all, i tried the above code as I was facing the same issue, the above code is giving the following error

Unexpected item type: <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=270x184 at 0x7C60195D9CC0>.Only types that represent a single Content or a single Part are supported here.

not sure if I am doing something wrong

rocpoc commented 1 month ago

@holtskinner +1, I am seeing this issue too.

I've also been hitting numerous quota issues despite adding:

add_sleep_after_page = True
sleep_time_after_page = 5