Closed FrostyTheSouthernSnowman closed 3 months ago
it seems that the code in Gemini API Overview is not correct,
model = genai.GenerativeModel('gemini-pro-vision')
cookie_picture = [{
'mime_type': 'image/png',
'data': Path('cookie.png').read_bytes()
}]
prompt = "Do these look store-bought or homemade?"
response = model.generate_content(
model="gemini-pro-vision", # parameter model is no need here
content=[prompt, cookie_picture]
)
print(response.text)
Definitely seems to be the case
In my tests PNG is working fine.
IDK what your screenshot = get_screen_data()
function is.
Can you share a colab that reproducs es the problem?
it seems that the code in Gemini API Overview is not correct,
Thanks, I'm sending a fix for this.
Hi, I am trying to read image from https: URL, but it seems to be not working, it's showing below error: ChatGoogleGenerativeAIError: Invalid argument provided to Gemini: 400 Add an image to use models/gemini-pro-vision, or switch your model to a text model.
In my tests PNG is working fine.
IDK what your
screenshot = get_screen_data()
function is.Can you share a colab that reproducs es the problem?
it seems that the code in Gemini API Overview is not correct,
Thanks, I'm sending a fix for this.
Here's the get_screen_data()
:
screen = ImageGrab.grab(bbox=(0, 0, *primary_monitor_dimensions))
screen = draw_mouse(screen)
screen = screen.resize((int(screen.size[0] / 2), int(screen.size[1] / 2)))
if save_screenshot:
screen.save('screen.png')
return screen```
ImageGrab comes from PIL.
Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.
This is caused because the code generates the bytes to send tries to create a JPEG file, but the image is RGBA.
Adding a .convert('RGB')
before saving it fixes this.
In [13]: model = genai.GenerativeModel(model_name='gemini-pro-vision')
In [14]: model.generate_content([img2, "what's this"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/Projects/venv3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:650, in _save(im, fp, filename)
649 try:
--> 650 rawmode = RAWMODE[im.mode]
651 except KeyError as e:
KeyError: 'RGBA'
The above exception was the direct cause of the following exception:
OSError Traceback (most recent call last)
Cell In[14], line 1
----> 1 model.generate_content([img2, "what's this"])
File ~/Projects/generative-ai-python/google/generativeai/generative_models.py:236, in GenerativeModel.generate_content(self, contents, generation_config, safety_settings, stream, tools, tool_config, request_options)
233 if not contents:
234 raise TypeError("contents must not be empty")
--> 236 request = self._prepare_request(
237 contents=contents,
238 generation_config=generation_config,
239 safety_settings=safety_settings,
240 tools=tools,
241 tool_config=tool_config,
242 )
243 if self._client is None:
244 self._client = client.get_default_generative_client()
File ~/Projects/generative-ai-python/google/generativeai/generative_models.py:139, in GenerativeModel._prepare_request(self, contents, generation_config, safety_settings, tools, tool_config)
136 else:
137 tool_config = content_types.to_tool_config(tool_config)
--> 139 contents = content_types.to_contents(contents)
141 generation_config = generation_types.to_generation_config_dict(generation_config)
142 merged_gc = self._generation_config.copy()
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:293, in to_contents(contents)
288 except TypeError:
289 # If you get a TypeError here it's probably because that was a list
290 # of parts, not a list of contents, so fall back to `to_content`.
291 pass
--> 293 contents = [to_content(contents)]
294 return contents
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:256, in to_content(content)
254 return content
255 elif isinstance(content, Iterable) and not isinstance(content, str):
--> 256 return protos.Content(parts=[to_part(part) for part in content])
257 else:
258 # Maybe this is a Part?
259 return protos.Content(parts=[to_part(content)])
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:256, in <listcomp>(.0)
254 return content
255 elif isinstance(content, Iterable) and not isinstance(content, str):
--> 256 return protos.Content(parts=[to_part(part) for part in content])
257 else:
258 # Maybe this is a Part?
259 return protos.Content(parts=[to_part(content)])
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:224, in to_part(part)
220 return protos.Part(function_response=part)
222 else:
223 # Maybe it can be turned into a blob?
--> 224 return protos.Part(inline_data=to_blob(part))
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:164, in to_blob(blob)
162 return blob
163 elif isinstance(blob, IMAGE_TYPES):
--> 164 return image_to_blob(blob)
165 else:
166 if isinstance(blob, Mapping):
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:89, in image_to_blob(image)
87 if PIL is not None:
88 if isinstance(image, PIL.Image.Image):
---> 89 return pil_to_blob(image)
91 if IPython is not None:
92 if isinstance(image, IPython.display.Image):
File ~/Projects/generative-ai-python/google/generativeai/types/content_types.py:79, in pil_to_blob(img)
77 mime_type = "image/png"
78 else:
---> 79 img.save(bytesio, format="JPEG")
80 mime_type = "image/jpeg"
81 bytesio.seek(0)
File ~/Projects/venv3/lib/python3.11/site-packages/PIL/Image.py:2439, in Image.save(self, fp, format, **params)
2436 fp = builtins.open(filename, "w+b")
2438 try:
-> 2439 save_handler(self, fp, filename)
2440 except Exception:
2441 if open_fp:
File ~/Projects/venv3/lib/python3.11/site-packages/PIL/JpegImagePlugin.py:653, in _save(im, fp, filename)
651 except KeyError as e:
652 msg = f"cannot write mode {im.mode} as JPEG"
--> 653 raise OSError(msg) from e
655 info = im.encoderinfo
657 dpi = [round(x) for x in info.get("dpi", (0, 0))]
Description of the bug:
Calling
generate_content
on a Gemini Pro Vision model returns an error when it receives a PNG image sayingKeyError: 'RGBA'
which causes another execption sayingOSError: cannot write mode RGBA as JPEG
. This seems to indicate that PNG is not supported, but according to the Gemini API docs, PNG is a supported MIME type. Note that the png example from that docs page doesn't seem to work. It uses acontents
kwarg togenerate_content
, but that argument doesn't exist. Modifying the code to use the right arguments gives the errorgoogle.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.
Actual vs expected behavior:
The expected behavior is for this code:
to work successfully. This code was modified from the text from image and text example in the quickstart. Instead, it outputs the KeyError and OSError above. Changing the code to:
Raises a 400 error as described above. This code is modified from that Gemini API Overview
Any other information you'd like to share?
112 is related to this. Specifically, it deals with my second attempt at solving this problem. This issue is about the fact that generate_content doesn't handle PNG by default even though it is supposedly supported.