Add support for Image, Video, and Audio input into Forge and AutoGPT

From boosterbot.ai, before we setup the integration with GitHub to auto comment the responses

forge/forge/llm/providers/multi.py - Defines the MultiProvider class, which serves as a unified interface to access multiple chat model providers. Consider updating the MultiProvider class in forge/forge/llm/providers/multi.py to include methods for processing Image, Video, and Audio inputs. Ensure that these methods route the inputs to the appropriate LLMs based on their configurations.
autogpt/autogpt/app/configurator.py - Handles configuration settings and overrides within the AutoGPT application. Consider updating the apply_overrides_to_config function in autogpt/autogpt/app/configurator.py to include new fields for Image, Video, and Audio support in the LLM configurations.
autogpt/autogpt/app/main.py - Serves as the entry point for the application, managing setup, configuration, and execution. Consider updating the main application logic in autogpt/autogpt/app/main.py to handle Image, Video, and Audio inputs and route them to the appropriate processing components.
autogpt/autogpt/app/agent_protocol_server.py - Manages tasks, steps, artifacts, and interactions with agents in the AutoGPT system. Consider updating the AgentProtocolServer class in autogpt/autogpt/app/agent_protocol_server.py to handle Image, Video, and Audio inputs and ensure they are processed correctly.
forge/forge/components/image_gen/image_gen.py - Provides commands to generate images from text prompts using various providers. Consider using the ImageGeneratorComponent in forge/forge/components/image_gen/image_gen.py as a reference for handling Image inputs and extend similar logic for Video and Audio inputs.
forge/forge/llm/providers/schema.py - Defines models and structures related to language and embedding models used in the system. Consider updating the models and structures in forge/forge/llm/providers/schema.py to include support for Image, Video, and Audio inputs.
forge/forge/llm/providers/openai.py - Manages interactions with OpenAI's API for chat completions and embeddings. Consider updating the OpenAIProvider class in forge/forge/llm/providers/openai.py to handle Image, Video, and Audio inputs if OpenAI supports these features.
forge/forge/llm/providers/groq.py - Manages interactions with Groq's API for chat completions and embeddings. Consider updating the GroqProvider class in forge/forge/llm/providers/groq.py to handle Image, Video, and Audio inputs if Groq supports these features.
forge/forge/llm/providers/__init__.py - Initializes various provider modules within the LLM system. Consider updating the __init__.py file in forge/forge/llm/providers to include the new input types for Image, Video, and Audio.
autogpt/autogpt/app/input.py - Handles user input within the AutoGPT application. Consider updating the clean_input function in autogpt/autogpt/app/input.py to handle new input types for Image, Video, and Audio.

Significant-Gravitas / AutoGPT

Add support for Image, Video, and Audio input into Forge and AutoGPT #7152

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦