google-gemini / generative-ai-python

The official Python library for the Google Gemini API
https://pypi.org/project/google-generativeai/
Apache License 2.0
1.39k stars 270 forks source link

Automatic image blob creation doesn't handle RGBA images with JPEG. #160

Closed FrostyTheSouthernSnowman closed 3 months ago

FrostyTheSouthernSnowman commented 8 months ago

Description of the bug:

Calling generate_content on a Gemini Pro Vision model returns an error when it receives a PNG image saying KeyError: 'RGBA' which causes another execption saying OSError: cannot write mode RGBA as JPEG. This seems to indicate that PNG is not supported, but according to the Gemini API docs, PNG is a supported MIME type. Note that the png example from that docs page doesn't seem to work. It uses a contents kwarg to generate_content, but that argument doesn't exist. Modifying the code to use the right arguments gives the error google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.

Actual vs expected behavior:

The expected behavior is for this code:

screenshot = get_screen_data()

prompt = "What are your thoughts on this screenshot? I think"

response = model.generate_content(
    [prompt, screenshot], stream=True
)

response.resolve()

print(response.text)

to work successfully. This code was modified from the text from image and text example in the quickstart. Instead, it outputs the KeyError and OSError above. Changing the code to:

screenshot = get_screen_data()

screenshot_data = {
    'mime_type': 'image/png',
    'data': screenshot.tobytes()
}

prompt = "What are your thoughts on this screenshot? I think"

response = model.generate_content(
    [prompt, screenshot_data], stream=True
)

response.resolve()
print(response.text)

Raises a 400 error as described above. This code is modified from that Gemini API Overview

Any other information you'd like to share?

112 is related to this. Specifically, it deals with my second attempt at solving this problem. This issue is about the fact that generate_content doesn't handle PNG by default even though it is supposedly supported.

Andy963 commented 6 months ago

it seems that the code in Gemini API Overview is not correct,

model = genai.GenerativeModel('gemini-pro-vision')

cookie_picture = [{
    'mime_type': 'image/png',
    'data': Path('cookie.png').read_bytes()
}]
prompt = "Do these look store-bought or homemade?"

response = model.generate_content(
    model="gemini-pro-vision", # parameter model is no need here
    content=[prompt, cookie_picture]
)
print(response.text)
FrostyTheSouthernSnowman commented 5 months ago

Definitely seems to be the case

MarkDaoust commented 3 months ago

In my tests PNG is working fine.

IDK what your screenshot = get_screen_data() function is.

Can you share a colab that reproducs es the problem?

it seems that the code in Gemini API Overview is not correct,

Thanks, I'm sending a fix for this.

ya-stack commented 3 months ago

Hi, I am trying to read image from https: URL, but it seems to be not working, it's showing below error: ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 Add an image to use models/gemini-pro-vision, or switch your model to a text model.

FrostyTheSouthernSnowman commented 3 months ago

In my tests PNG is working fine.

IDK what your screenshot = get_screen_data() function is.

Can you share a colab that reproducs es the problem?

it seems that the code in Gemini API Overview is not correct,

Thanks, I'm sending a fix for this.

Here's the get_screen_data():


    screen = ImageGrab.grab(bbox=(0, 0, *primary_monitor_dimensions))

    screen = draw_mouse(screen)

    screen = screen.resize((int(screen.size[0] / 2), int(screen.size[1] / 2)))

    if save_screenshot:
        screen.save('screen.png')

    return screen```

ImageGrab comes from PIL.
github-actions[bot] commented 3 months ago

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

MarkDaoust commented 3 months ago

This is caused because the code generates the bytes to send tries to create a JPEG file, but the image is RGBA. Adding a .convert('RGB') before saving it fixes this.

In [13]: model = genai.GenerativeModel(model_name='gemini-pro-vision')

In [14]: model.generate_content([img2, "what's this"])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/Projects/venv3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:650, in _save(im, fp, filename)
    649 try:
--> 650     rawmode = RAWMODE[im.mode]
    651 except KeyError as e:

KeyError: 'RGBA'

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[14], line 1
----> 1 model.generate_content([img2, "what's this"])

File ~/Projects/generative-ai-python/google/generativeai/generative_models.py:236, in GenerativeModel.generate_content(self, contents, generation_config, safety_settings, stream, tools, tool_config, request_options)
    233 if not contents:
    234     raise TypeError("contents must not be empty")
--> 236 request = self._prepare_request(
    237     contents=contents,
    238     generation_config=generation_config,
    239     safety_settings=safety_settings,
    240     tools=tools,
    241     tool_config=tool_config,
    242 )
    243 if self._client is None:
    244     self._client = client.get_default_generative_client()

File ~/Projects/generative-ai-python/google/generativeai/generative_models.py:139, in GenerativeModel._prepare_request(self, contents, generation_config, safety_settings, tools, tool_config)
    136 else:
    137     tool_config = content_types.to_tool_config(tool_config)
--> 139 contents = content_types.to_contents(contents)
    141 generation_config = generation_types.to_generation_config_dict(generation_config)
    142 merged_gc = self._generation_config.copy()

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:293, in to_contents(contents)
    288     except TypeError:
    289         # If you get a TypeError here it's probably because that was a list
    290         # of parts, not a list of contents, so fall back to `to_content`.
    291         pass
--> 293 contents = [to_content(contents)]
    294 return contents

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:256, in to_content(content)
    254     return content
    255 elif isinstance(content, Iterable) and not isinstance(content, str):
--> 256     return protos.Content(parts=[to_part(part) for part in content])
    257 else:
    258     # Maybe this is a Part?
    259     return protos.Content(parts=[to_part(content)])

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:256, in <listcomp>(.0)
    254     return content
    255 elif isinstance(content, Iterable) and not isinstance(content, str):
--> 256     return protos.Content(parts=[to_part(part) for part in content])
    257 else:
    258     # Maybe this is a Part?
    259     return protos.Content(parts=[to_part(content)])

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:224, in to_part(part)
    220     return protos.Part(function_response=part)
    222 else:
    223     # Maybe it can be turned into a blob?
--> 224     return protos.Part(inline_data=to_blob(part))

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:164, in to_blob(blob)
    162     return blob
    163 elif isinstance(blob, IMAGE_TYPES):
--> 164     return image_to_blob(blob)
    165 else:
    166     if isinstance(blob, Mapping):

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:89, in image_to_blob(image)
     87 if PIL is not None:
     88     if isinstance(image, PIL.Image.Image):
---> 89         return pil_to_blob(image)
     91 if IPython is not None:
     92     if isinstance(image, IPython.display.Image):

File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:79, in pil_to_blob(img)
     77     mime_type = "image/png"
     78 else:
---> 79     img.save(bytesio, format="JPEG")
     80     mime_type = "image/jpeg"
     81 bytesio.seek(0)

File ~/Projects/venv3/lib/python3.11/site-packages/PIL/Image.py:2439, in Image.save(self, fp, format, **params)
   2436         fp = builtins.open(filename, "w+b")
   2438 try:
-> 2439     save_handler(self, fp, filename)
   2440 except Exception:
   2441     if open_fp:

File ~/Projects/venv3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:653, in _save(im, fp, filename)
    651 except KeyError as e:
    652     msg = f"cannot write mode {im.mode} as JPEG"
--> 653     raise OSError(msg) from e
    655 info = im.encoderinfo
    657 dpi = [round(x) for x in info.get("dpi", (0, 0))]