Open lightaime opened 1 year ago
I am interested in developing a multimodal platform for different single/multi-agents to compare. I think a first step might be enabling multimodal input for LLMs, like a MultimodalPrompt
. I think the feature needs some discussion. Is there some guideline on what should I do/investigate before a pull request?
Required prerequisites
Motivation
The current framework uses two LLMs as the backend for role-playing. It is limited to text-only tasks. We would like to introduce multi-modal agents to solve tasks that are related to visual, audio, and so on.
Solution
No response
Alternatives
No response
Additional context
No response