GoogleCloudPlatform / firebase-extensions

Apache License 2.0
74 stars 36 forks source link

Multimodal Tasks with the Gemini API Won't Analyze any Image #492

Closed TuncayEnsi closed 5 months ago

TuncayEnsi commented 5 months ago

[REQUIRED] Step 2: Describe your configuration

[REQUIRED] Step 3: Describe the problem

When Adding any type of image from any source an any size or format even in base64 The Image is not being analyzed by gemini and the Model hallucinates any random image as if given to it.

Steps to reproduce:

Use My configuration and add a document to the given collection containing an imageURL field (I named it that way). You can try adding the ImageURL Field with {{ImageURL}} field to the base prompt in extension config or you can give it via writing it in the document and use any kind of image from any source let it be from the open web or from a open storage bucket from the same Firebase Project.

Expected result

It should work with the prompt and image and give back an analysis of the image in consideration of the actual prompt regarding it.

Actual result

Either Hallucinating any random image as if it was given to it in a vivid way or just say: Give me the image or an url to it

cabljac commented 5 months ago

Thanks! Looking into this now :)