CsabaConsulting / InspectorGadgetApp

Open Multi-Modal Perosnal Agent
MIT License
0 stars 0 forks source link

Add Multi Modal capability #9

Closed MrCsabaToth closed 2 weeks ago

MrCsabaToth commented 1 month ago

Asking questions about what the camera sees or images from an album. Maybe even video. There are several articles:

MrCsabaToth commented 1 month ago

The UI is redesigned a little, now multi modal mode has a button. I imagine that multi modal could be an extra step either before the recording + STT or during those steps. Maybe for architecture it'd be better to separate the photo taking / picking and the recording / STT steps. The photo taking could be it's own Cubit + View Page, and once it is successful it can be just popped from the navigation stack and continue with the uni modal LLM modes, so this way we can completely reuse the Interaction View and Page. Of course the image (or video?) would be passed on as an extra parameter.

MrCsabaToth commented 2 weeks ago

Another code sample: https://github.com/google-gemini/generative-ai-dart/blob/main/samples/dart/bin/simple_text_and_image.dart And example app: https://github.com/google-gemini/generative-ai-dart/blob/main/samples/flutter_app/lib/main.dart