google-gemini / cookbook

Examples and guides for using the Gemini API.
https://ai.google.dev/gemini-api/docs
Apache License 2.0
4.26k stars 586 forks source link

Run on Multiple Images #201

Open sauravsinhaa opened 1 month ago

sauravsinhaa commented 1 month ago

Description of the feature request:

I cant run on multiple images as i am trying to run it on many images and store there output but it is saying { "image": "800040_2.pdf_4.jpg", "description": "I am sorry, I cannot access or process any external files, including images. Therefore, I am unable to extract information from the provided image. \n\nTo assist you, I need the text content of the menu itself. If you can copy and paste the text here, I can help you process it and return the data in the JSON format you requested. \n" }

What problem are you trying to solve with this feature?

But when i run on single images it works please let me know what i have done wrong

Any other information you'd like to share?

import os import json import google.generativeai as genai import google.generativeai as genai from google.colab import userdata GOOGLE_API_KEY=userdata.get('secretName') genai.configure(api_key=GOOGLE_API_KEY) model = genai.GenerativeModel('gemini-1.5-flash')

image_directory = '/content/unzipped_files/New folder/'

for filename in os.listdir(image_directory): if filename.lower().endswith(('.png', '.jpg', '.jpeg')): img_path = os.path.join(image_directory, filename)

    # Define your prompt
    prompt = """This image contains the restaurant menu we want to extract the dish name, section , price, description. section will be available at top only if not give blank section. Return output in json format:
                    {section: section , dish name: dish name, price:price, description:description}"""

    # Generate content using the model
    response = model.generate_content([prompt, img_path])  # Pass prompt and image path

    # Create a dictionary for this image
    image_data = {
        'image': filename,
        'description': response.text  # Access the text from the response
    }

    # Save the dictionary as a JSON file
    json_filename = os.path.splitext(filename)[0] + '.json'  # Remove extension and add .json
    json_path = os.path.join(image_directory, json_filename)
    with open(json_path, 'w') as f:
        json.dump(image_data, f, indent=4)

print("JSON files created successfully!")

singhniraj08 commented 1 month ago

@sauravsinhaa, Thank you reporting this issue. This repository is for issues related to Cookbook guides and examples for Gemini API. The output is generated by the Gemini API and for issues related to Gemini API, we would suggest you to use "Send Feedback" option in Gemini docs. Ref: Screenshot below. You can also post this issue on Discourse forum.

image