Open yatendra2001 opened 8 months ago
@superiorsd10 would you like to take this up next?
Let's keep it simple. Multimodal chat works the same way as normal chat.
@superiorsd10 would you like to take this up next?
Yes, I would like to work on this. But before this, I'll have to work on #148. So that the users can clear the existing chat.
just looping in, @superiorsd10 checkout generateTextFromImage
in gemini-repository.ts
. It can particularly help in this case.
just looping in, @superiorsd10 checkout
generateTextFromImage
ingemini-repository.ts
. It can particularly help in this case.
Sure, thanks for the help :)
Hello @samyakkkk and @yatendra2001 👋
I wanted to share the approach to integrating this feature into the extension. Please review the plan, and if any changes or adjustments are needed, your feedback would be greatly appreciated. If everything looks good, I'm excited to start working on the implementation.
Here's the proposed approach (as understood by me from the above conversation):
generateTextFromImage
function, passing the prompt, selected image, and image type for processing.generateTextFromImage
in the chat container as a message generated by the model.Thank you,
@superiorsd10 thanks for the clear plan of action. lgtm! let's execute.
Hello @yatendra2001 👋
I want to show the selected image in the user's message (in the chat-container) with the prompt, but it's not working, and only alt text is being shown.
For the debugging purposes, I hardcoded the image path to be shown, but still it's not being shown.
Can you please help me with this? Am I missing something here?
Thank you,
Hi @superiorsd10, can we connect in our community channel: https://join.slack.com/t/welltested-ai/shared_invite/zt-25u09fty8-gaggH9HbmopB~4tialTrlA.
We will be able to work closely with you here. Please send me a 👋 .
Description
At present, Gemini-pro-vision doesn't support multi-turn text-based conversations. However, there is a need for multi-turn multi-modal chat capabilities.
One option is to await the implementation of multi-turn multi-modal functionality in Gemini. Alternatively, as a temporary solution, we can consider attaching images as discrete features to enrich the conversation experience.
What do you think @samyakkkk ?