[Feat]: Integrate vision model in chat.

yatendra2001 commented 8 months ago

Description

At present, Gemini-pro-vision doesn't support multi-turn text-based conversations. However, there is a need for multi-turn multi-modal chat capabilities.

One option is to await the implementation of multi-turn multi-modal functionality in Gemini. Alternatively, as a temporary solution, we can consider attaching images as discrete features to enrich the conversation experience.

What do you think @samyakkkk ?

samyakkkk commented 8 months ago

@superiorsd10 would you like to take this up next?

samyakkkk commented 8 months ago

Let's keep it simple. Multimodal chat works the same way as normal chat.

We will just add an image selection square box icon on the left of the text field. Users can use it to attach one of multiple images and send a message to Gemini.
Since the multimodal models doesn't support multi turn chats, if a user sends a follow up, we show an error snackbar saying: "Follow up not allowed with images. Please clear existing chat to send a new message."

superiorsd10 commented 8 months ago

@superiorsd10 would you like to take this up next?

Yes, I would like to work on this. But before this, I'll have to work on #148. So that the users can clear the existing chat.

yatendra2001 commented 8 months ago

just looping in, @superiorsd10 checkout generateTextFromImage in gemini-repository.ts. It can particularly help in this case.

superiorsd10 commented 8 months ago

just looping in, @superiorsd10 checkout generateTextFromImage in gemini-repository.ts. It can particularly help in this case.

Sure, thanks for the help :)

superiorsd10 commented 7 months ago

Hello @samyakkkk and @yatendra2001 👋

I wanted to share the approach to integrating this feature into the extension. Please review the plan, and if any changes or adjustments are needed, your feedback would be greatly appreciated. If everything looks good, I'm excited to start working on the implementation.

Here's the proposed approach (as understood by me from the above conversation):

Introduce an image selection box positioned to the left of the prompt text field.
Users can tap the image selection box to choose a single image (accepted formats: 'png', 'jpg', 'jpeg', 'gif', 'bmp').
Allow users to enter a prompt text alongside the selected image.
Call the generateTextFromImage function, passing the prompt, selected image, and image type for processing.
Display the result obtained from generateTextFromImage in the chat container as a message generated by the model.
Implement error handling for attempting to add another image in the same conversation without clearing the chat history.
In case of a follow-up attempt with an image present, show an error snack bar instructing the user to clear the chat history before sending a new message.

Thank you,

samyakkkk commented 7 months ago

@superiorsd10 thanks for the clear plan of action. lgtm! let's execute.

superiorsd10 commented 7 months ago

Hello @yatendra2001 👋

I want to show the selected image in the user's message (in the chat-container) with the prompt, but it's not working, and only alt text is being shown.

For the debugging purposes, I hardcoded the image path to be shown, but still it's not being shown.

Can you please help me with this? Am I missing something here?

Thank you,

samyakkkk commented 7 months ago

Hi @superiorsd10, can we connect in our community channel: https://join.slack.com/t/welltested-ai/shared_invite/zt-25u09fty8-gaggH9HbmopB~4tialTrlA.

We will be able to work closely with you here. Please send me a 👋 .

CommandDash / commanddash

[Feat]: Integrate vision model in chat. #121

Description