CsabaConsulting / InspectorGadgetApp

Open Multi-Modal Personal Assistant
MIT License
3 stars 1 forks source link

Being able to edit images #29

Open MrCsabaToth opened 1 month ago

MrCsabaToth commented 1 month ago

Once we'll make the app able to generate images (#24) - so it can receive an image not just text, and it can display the result, - then we should be able to edit image as well. Maybe this feature won't even need any extra coding?

MrCsabaToth commented 3 weeks ago

Even though we could see in some demos potential image or audio outputs, currently the Gemini multi-modality are input-only: https://www.linkedin.com/posts/netskink_how-to-make-an-audio-podcast-as-demonstrated-activity-7230943255578697729-hGBI

I've seen other assistant project which also used STT and TTS like me. We can consider Imagen3 for image related generations or edits, but that would be a separate interaction mode, since the prompt would need to be passed to Imagen3 and not Gemini. Similarly, for music or audio generation we'd need a dedicated interaction, maybe it could be an extension of the Shazam mode? #38

MrCsabaToth commented 2 weeks ago

Gemini Advanced itself is relying on Imagen3. This is the way https://www.theverge.com/2024/8/28/24230445/google-gemini-create-ai-generated-people-imagen-3