CommandDash / commanddash

AI assist to integrate APIs and SDKs without reading docs.
https://commanddash.io
Apache License 2.0
266 stars 46 forks source link

[Feat]: Integrate vision model in chat. #121

Open yatendra2001 opened 8 months ago

yatendra2001 commented 8 months ago

Description

At present, Gemini-pro-vision doesn't support multi-turn text-based conversations. However, there is a need for multi-turn multi-modal chat capabilities.

One option is to await the implementation of multi-turn multi-modal functionality in Gemini. Alternatively, as a temporary solution, we can consider attaching images as discrete features to enrich the conversation experience.

What do you think @samyakkkk ?

Screenshot 2023-12-23 at 2 07 51 PM
samyakkkk commented 8 months ago

@superiorsd10 would you like to take this up next?

samyakkkk commented 8 months ago

Let's keep it simple. Multimodal chat works the same way as normal chat.

  1. We will just add an image selection square box icon on the left of the text field. Users can use it to attach one of multiple images and send a message to Gemini.
  2. Since the multimodal models doesn't support multi turn chats, if a user sends a follow up, we show an error snackbar saying: "Follow up not allowed with images. Please clear existing chat to send a new message."
superiorsd10 commented 8 months ago

@superiorsd10 would you like to take this up next?

Yes, I would like to work on this. But before this, I'll have to work on #148. So that the users can clear the existing chat.

yatendra2001 commented 8 months ago

just looping in, @superiorsd10 checkout generateTextFromImage in gemini-repository.ts. It can particularly help in this case.

superiorsd10 commented 8 months ago

just looping in, @superiorsd10 checkout generateTextFromImage in gemini-repository.ts. It can particularly help in this case.

Sure, thanks for the help :)

superiorsd10 commented 7 months ago

Hello @samyakkkk and @yatendra2001 👋

I wanted to share the approach to integrating this feature into the extension. Please review the plan, and if any changes or adjustments are needed, your feedback would be greatly appreciated. If everything looks good, I'm excited to start working on the implementation.

Here's the proposed approach (as understood by me from the above conversation):

Thank you,

samyakkkk commented 7 months ago

@superiorsd10 thanks for the clear plan of action. lgtm! let's execute.

superiorsd10 commented 7 months ago

Hello @yatendra2001 👋

I want to show the selected image in the user's message (in the chat-container) with the prompt, but it's not working, and only alt text is being shown.

For the debugging purposes, I hardcoded the image path to be shown, but still it's not being shown.

Can you please help me with this? Am I missing something here?

Thank you,

samyakkkk commented 7 months ago

Hi @superiorsd10, can we connect in our community channel: https://join.slack.com/t/welltested-ai/shared_invite/zt-25u09fty8-gaggH9HbmopB~4tialTrlA.

We will be able to work closely with you here. Please send me a 👋 .