RMNCLDYO / gemini-ai-toolkit

Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.
MIT License
40 stars 9 forks source link
artificial-intelligence audio-transcribing chatbot conversational-ai gemini gemini-advanced gemini-api gemini-flash gemini-pro gemini-pro-1-5-experimental gemini-pro-api gemini-pro-flash gemini-pro-vision google google-api google-deepmind google-gemini image-analysis text-processing video-processing

Google Gemini AI

Gemini AI Toolkit

maintained - yes contributions - welcome

Google Gemini AI

> [!NOTE] > *This toolkit supports Google's newest Gemini 1.5 Pro and Flash experimental models (as of August 2024)* **Unleash the power of Google's Gemini AI models with a versatile and user-friendly toolkit.** Gemini AI Toolkit is a powerful interface for seamless integration with Google's cutting-edge Gemini language models, offering chat, text generation, and multimodal interactions in one comprehensive package. ## πŸš€ Features - **Multimodal Interaction**: Process and analyze various file types (PDFs, images, videos, audio, text, documents, code and more) - **Interactive Chat**: Engage in real-time, context-aware conversations - **Text Generation**: Create high-quality content based on prompts - **File Handling**: Upload and process local files and URLs with automatic temporary storage management - **Customizable Parameters**: Fine-tune AI interactions (temperature, token limits, safety thresholds, etc.) - **Streaming Responses**: Receive AI-generated content in real-time - **JSON Output**: Generate structured data for easy system integration - **Lightweight Design**: Minimal dependencies, primarily using the `requests` package ## πŸ“‹ Table of Contents - [Installation](#-installation) - [API Key Configuration](#-configuration) - [Usage](#-usage) - [Special Commands](#-special-commands) - [Advanced Configuration](#-advanced-configuration) - [Supported Models](#-supported-models) - [Error Handling and Safety](#-error-handling-and-safety) - [Supported File Types](#-supported-file-types) - [Caching and Cleanup](#-caching-and-cleanup) - [Contributing](#-contributing) - [Reporting Issues](#-issues-and-support) - [Submitting Pull Requests](#-feature-requests) - [Versioning and Changelog](#-versioning-and-changelog) - [Security](#-security) - [License](#-license) ## πŸ›  Installation 1. Clone the repository: ```bash git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git ``` 2. Navigate to the repository folder: ```bash cd gemini-ai-toolkit ``` 3. Install the required dependencies: ```bash pip install -r requirements.txt ``` ## πŸ”‘ Configuration 1. Obtain an API key from [Google AI Studio](https://aistudio.google.com/app/apikey). 2. You have three options for managing your API key:
Click here to view the API key configuration options - **Setting it as an environment variable on your device (recommended for everyday use)** - Navigate to your terminal. - Add your API key like so: ```shell export GEMINI_API_KEY=your_api_key ``` This method allows the API key to be loaded automatically when using the wrapper or CLI. - **Using an .env file (recommended for development):** - Install python-dotenv if you haven't already: `pip install python-dotenv`. - Create a .env file in the project's root directory or rename `example.env` in the root folder to `.env` and replace `your_api_key_here` with your API key. - Add your API key to the .env file like so: ```makefile GEMINI_API_KEY=your_api_key ``` This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly. - **Direct Input:** - If you prefer not to use a `.env` file, you can directly pass your API key as an argument to the CLI or the wrapper functions. ***CLI*** ```shell --api_key "your_api_key" ``` ***Wrapper*** ```shell api_key="your_api_key" ``` This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.
## πŸ’» Usage ### Multimodal Mode *For processing multiple input types including audio, video, text, images, code and a wide range of files. This mode allows you to upload files (from local paths or URLs), chat with the AI about the content, and maintain a knowledge base throughout the conversation.* ***CLI*** ```bash python cli.py --multimodal --prompt "Analyze both of these files and provide a summary of each, one by one. Don't overlook any details." --files file1.jpg https://example.com/file2.pdf ``` ***Wrapper*** ```python from gemini import Multimodal Multimodal().run(prompt="Analyze both of these files and provide a summary of each, one by one. Don't overlook any details.", files=["file1.jpg", "https://example.com/file2.pdf"]) ``` ### Chat Mode *For interactive conversations with the AI model.* ***CLI*** ```bash python cli.py --chat ``` ***Wrapper*** ```python from gemini import Chat Chat().run() ``` ### Text Mode *For generating text based on a prompt or a set of instructions.* ***CLI*** ```bash python cli.py --text --prompt "Write a story about a magic backpack." ``` ***Wrapper*** ```python from gemini import Text Text().run(prompt="Write a story about a magic backpack.") ``` ## πŸ”§ Special Commands During interaction with the toolkit, you can use the following special commands: - `/exit` or `/quit`: End the conversation and exit the program. - `/clear`: Clear the conversation history (useful for saving API credits). - `/upload`: Upload a file for multimodal processing. - Usage: `/upload file_path_and_or_url [optional prompt]` - Example: `/upload file1.jpg https://example.com/file2.pdf Analyze the files and provide a summary of each` ## βš™οΈ Advanced Configuration | Description | CLI Flags | CLI Usage | Wrapper Usage | |-------------|-----------|-----------|---------------| | Chat mode | `-c`, `--chat` | `--chat` | *See mode usage above.* | | Text mode | `-t`, `--text` | `--text` | *See mode usage above.* | | Multimodal mode | `-m`, `--multimodal` | `--multimodal` | *See mode usage above.* | | User prompt | `-p`, `--prompt` | `--prompt "Your prompt here"` | `prompt="Your prompt here"` | | File inputs | `-f`, `--files` | `--files file1.jpg https://example.com/file2.pdf` | `files=["file1.jpg", "https://example.com/file2.pdf"]` | | Enable streaming | `-s`, `--stream` | `--stream` | `stream=True` | | Enable JSON output | `-js`, `--json` | `--json` | `json=True` | | API Key | `-ak`, `--api_key` | `--api_key "your_api_key"` | `api_key="your_api_key"` | | Model name | `-md`, `--model` | `--model "gemini-1.5-flash"` | `model="gemini-1.5-flash"` | | System prompt | `-sp`, `--system_prompt` | `--system_prompt "Set custom instructions"` | `system_prompt="Set custom instructions"` | | Max tokens | `-mt`, `--max_tokens` | `--max_tokens 1024` | `max_tokens=1024` | | Temperature | `-tm`, `--temperature` | `--temperature 0.7` | `temperature=0.7` | | Top-p | `-tp`, `--top_p` | `--top_p 0.9` | `top_p=0.9` | | Top-k | `-tk`, `--top_k` | `--top_k 40` | `top_k=40` | | Candidate count | `-cc`, `--candidate_count` | `--candidate_count 1` | `candidate_count=1` | | Stop sequences | `-ss`, `--stop_sequences` | `--stop_sequences ["\n", "."]` | `stop_sequences=["\n", "."]` | | Safety categories | `-sc`, `--safety_categories` | `--safety_categories ["HARM_CATEGORY_HARASSMENT"]` | `safety_categories=["HARM_CATEGORY_HARASSMENT"]` | | Safety thresholds | `-st`, `--safety_thresholds` | `--safety_thresholds ["BLOCK_NONE"]` | `safety_thresholds=["BLOCK_NONE"]` | ## πŸ“Š Supported Models ### Base Models | **Model** | **Inputs** | **Context Length** | |---|---|---| | `gemini-1.5-pro` | Text, images, audio, video | 8192 | | `gemini-1.5-flash` | Text, images, audio, video | 8192 | | `gemini-1.0-pro` | Text | 2048 | ### Experimental Models | **Model** | **Inputs** | **Context Length** | |---|---|---| | `gemini-1.5-pro-exp-0827` | Text, images, audio, video | 8192 | | `gemini-1.5-pro-exp-0801` | Text, images, audio, video | 8192 | | `gemini-1.5-flash-exp-0827` | Text, images, audio, video | 8192 | | `gemini-1.5-flash-8b-exp-0827` | Text, images, audio, video | 8192 | > [!NOTE] > *The availability of specific models may be subject to change. Always refer to Google's official documentation for the most up-to-date information on model availability and capabilities. See base models docs [here](https://ai.google.dev/gemini-api/docs/models/gemini) and experimental model docs [here](https://ai.google.dev/gemini-api/docs/models/experimental-models).* ## πŸ”’ Error Handling and Safety The Gemini AI Toolkit now includes robust error handling to help you diagnose and resolve issues quickly. Here are some common error codes and their solutions: | HTTP Code | Status | Description | Solution | |-----------|--------|-------------|----------| | 400 | INVALID_ARGUMENT | Malformed request body | Check API reference for correct format and supported versions | | 400 | FAILED_PRECONDITION | API not available in your country | Enable billing on your project in Google AI Studio | | 403 | PERMISSION_DENIED | API key lacks permissions | Verify API key and access rights | | 404 | NOT_FOUND | Resource not found | Check if all parameters are valid for your API version | | 429 | RESOURCE_EXHAUSTED | Rate limit exceeded | Ensure you're within model rate limits or request a quota increase | | 500 | INTERNAL | Unexpected error on Google's side | Retry after a short wait; report persistent issues | | 503 | UNAVAILABLE | Service temporarily overloaded/down | Retry after a short wait; report persistent issues | For rate limit errors (429), the toolkit will automatically pause for 15 seconds before retrying the request. ## πŸ“ Supported File Types The Gemini AI Toolkit supports a wide range of file types for multimodal processing. Here are the supported file extensions: | Category | File Extensions | |--------------------|-----------------| | **Images** | `jpg`, `jpeg`, `png`, `webp`, `gif`, `heic`, `heif` | | **Videos** | `mp4`, `mpeg`, `mpg`, `mov`, `avi`, `flv`, `webm`, `wmv`, `3gp` | | **Audio** | `wav`, `mp3`, `aiff`, `aac`, `ogg`, `flac` | | **Text/Documents** | `txt`, `html`, `css`, `js`, `ts`, `csv`, `md`, `py`, `json`, `xml`, `rtf`, `pdf` | > [!NOTE] > *Google's Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours.* ## πŸ’Ύ Caching and Cleanup The Gemini AI Toolkit implements a caching mechanism for downloaded files to improve performance and reduce unnecessary network requests. Here's how it works: 1. When a file is downloaded from a URL, it's stored in a temporary cache folder (`.gemini_ai_toolkit_cache`). 2. The file will be used to process the request and will be stored locally due to Google's upload requirements. 3. The cache is automatically cleaned up at the end of each session to prevent accumulation of temporary files. You don't need to manage this cache manually, but it's good to be aware of its existence, especially if you're processing large files or have limited storage space. ## 🀝 Contributing Contributions are welcome! Please refer to [CONTRIBUTING.md](.github/CONTRIBUTING.md) for detailed guidelines on how to contribute to this project. ## πŸ› Issues and Support Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues: 1. Check if the issue has already been reported. 2. Use the [Bug Report](.github/ISSUE_TEMPLATE/bug_report.md) template to create a detailed report. 3. Submit the report [here](https://github.com/RMNCLDYO/gemini-ai-toolkit/issues). Your report will help us make the project better for everyone. ## πŸ’‘ Feature Requests Got an idea for a new feature? Feel free to suggest it. Here's how: 1. Check if the feature has already been suggested or implemented. 2. Use the [Feature Request](.github/ISSUE_TEMPLATE/feature_request.md) template to create a detailed request. 3. Submit the request [here](https://github.com/RMNCLDYO/gemini-ai-toolkit/issues). Your suggestions for improvements are always welcome. ## πŸ” Versioning and Changelog Stay up-to-date with the latest changes and improvements in each version: - [CHANGELOG.md](.github/CHANGELOG.md) provides detailed descriptions of each release. ## πŸ” Security Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in [SECURITY.md](.github/SECURITY.md). Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed. ## πŸ“„ License Licensed under the MIT License. See [LICENSE](LICENSE) for details.