google-gemini / cookbook

Examples and guides for using the Gemini API
https://ai.google.dev/gemini-api/docs
Apache License 2.0
5.48k stars 845 forks source link

Run on Multiple Images #201

Open sauravsinhaa opened 5 months ago

sauravsinhaa commented 5 months ago

Description of the feature request:

I cant run on multiple images as i am trying to run it on many images and store there output but it is saying { "image": "800040_2.pdf_4.jpg", "description": "I am sorry, I cannot access or process any external files, including images. Therefore, I am unable to extract information from the provided image. \n\nTo assist you, I need the text content of the menu itself. If you can copy and paste the text here, I can help you process it and return the data in the JSON format you requested. \n" }

What problem are you trying to solve with this feature?

But when i run on single images it works please let me know what i have done wrong

Any other information you'd like to share?

import os import json import google.generativeai as genai import google.generativeai as genai from google.colab import userdata GOOGLE_API_KEY=userdata.get('secretName') genai.configure(api_key=GOOGLE_API_KEY) model = genai.GenerativeModel('gemini-1.5-flash')

image_directory = '/content/unzipped_files/New folder/'

for filename in os.listdir(image_directory): if filename.lower().endswith(('.png', '.jpg', '.jpeg')): img_path = os.path.join(image_directory, filename)

    # Define your prompt
    prompt = """This image contains the restaurant menu we want to extract the dish name, section , price, description. section will be available at top only if not give blank section. Return output in json format:
                    {section: section , dish name: dish name, price:price, description:description}"""

    # Generate content using the model
    response = model.generate_content([prompt, img_path])  # Pass prompt and image path

    # Create a dictionary for this image
    image_data = {
        'image': filename,
        'description': response.text  # Access the text from the response
    }

    # Save the dictionary as a JSON file
    json_filename = os.path.splitext(filename)[0] + '.json'  # Remove extension and add .json
    json_path = os.path.join(image_directory, json_filename)
    with open(json_path, 'w') as f:
        json.dump(image_data, f, indent=4)

print("JSON files created successfully!")

singhniraj08 commented 5 months ago

@sauravsinhaa, Thank you reporting this issue. This repository is for issues related to Cookbook guides and examples for Gemini API. The output is generated by the Gemini API and for issues related to Gemini API, we would suggest you to use "Send Feedback" option in Gemini docs. Ref: Screenshot below. You can also post this issue on Discourse forum.

image

github-actions[bot] commented 3 months ago

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.