google-gemini / generative-ai-python

The official Python library for the Google Gemini API
https://pypi.org/project/google-generativeai/
Apache License 2.0
1.45k stars 280 forks source link

(google-generativeai: 0.8.1) Send the transparency PNG but look like the "gemini-pro" convert it to jpg. #567

Open Pjumpod opened 6 days ago

Pjumpod commented 6 days ago

Description of the bug:

My code is

> if imageext.upper() == ".PNG":
>         print("Make blank")
>         rez_img = rez_img.convert("RGBA")
>         print(rez_img.mode)
>         resize_img_path = os.path.join(save_path,"rez_" + os.path.basename(img_path))
>         rez_img.save(resize_img_path)
>         rez_img = pilimg.open(img_path)
>         print(getattr(rez_img, "get_format_mimetype", None))
>     model_use = genai.GenerativeModel(model_name=model)
>     try:
>         response = model_use.generate_content([system_prompt, rez_img], safety_settings=safety_settings)
>         response_text = str(response.text)
>     except Exception as e:
>         response_text = str(f"{e}")

image image

and here is the output/ `

Make blank RGBA <bound method ImageFile.get_format_mimetype of <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=3375x1894 at 0x1A0F83C6E80>>

` image

From this link, It should upload from generate_content as PNG and transparency mode. as show in #523 but when I got the output of "describe the image", I found the word, "on black background" which is mean the PNG with RGBA was convert to RGB.

Actual vs expected behavior:

expect to upload as PNG with RGBA. but actual still RGB.

image image

Any other information you'd like to share?

google-generativeai 0.8.1

ByeIO may not able to decode the alpha channel of an image. I attached this in the code review. image and Here is from stackoverflow.

manojssmk commented 5 days ago

Hi @Pjumpod

I've tested the code and images you mentioned, and it works correctly, producing a light blue background. You can check it out in this gist. I don't believe the issue lies with BytesIO. The format parameter used when saving the PIL image ensures that transparency is preserved.

Thanks

Pjumpod commented 5 days ago

@manojssmk the light blue that show in this issue is the color from my app background, not the actually picture.

The picture should not have any background. (Transparency.)

Pjumpod commented 5 days ago

@manojssmk you might try with your gist again with my test set pictures. btc A F R T

Pjumpod commented 4 days ago

@manojssmk you have to use the picture with rgba mode which it is the blank background. (Background should not have any color).

When you got the light blue background, that also showing your answer is still wrong.

manojssmk commented 4 days ago

Hi @Pjumpod

Yes, you're correct. The image with a blank background that was passed to the model is producing an incorrect output, showing the background as black. You can find the code in this gist.

Thanks

Pjumpod commented 4 days ago

Hi @Pjumpod

Yes, you're correct. The image with a blank background that was passed to the model is producing an incorrect output, showing the background as black. You can find the code in this gist.

Thanks

@manojssmk @MarkDaoust this is great, now we are in sync. I think this can fix on the server site to convert the picture to RGBA mode follow by mime type. or I am not sure if anything can fix on API at client site?

MarkDaoust commented 4 days ago

I haven't looked into this. But the behavior will be affected by: https://github.com/google-gemini/generative-ai-python/pull/570, that PR ensures that we don't process the images before sending them.

Try installing from main:

pip install git+https://github.com/google-gemini/generative-ai-python

But it's possible that the PR doesn't change anything: the API may handle the alpha channel by showing the model the picture over a black background. If the API isn't passing an actual alpha channel, there's not much I can do in the SDK.

MarkDaoust commented 4 days ago

Testing a bit, I'm just not convinced that the model uses the alpha channel at all.

b/369593779

Pjumpod commented 4 days ago

Testing a bit, I'm just not convinced that the model uses the alpha channel at all.

  • If I make an image totally transparent the model still describes it.
  • If I ask it why I can't see anything in the image pro says "I can see it, maybe there's something wrong with your display"
  • If I set different colors of transparent sections, the model reports the "correct" background color.

do you have any idea or if google can help?

MarkDaoust commented 4 days ago

I think this is happening in the API backend. I think there's nothing we can do from out here.

Pjumpod commented 3 days ago

Do you have any idea to report this bug to backend?

MarkDaoust commented 3 days ago

@Pjumpod , I did this morning. The b/369593779 in my previous message was an internal bug reference.