[Feature] 支持多模态对话功能：对话的问题和回复都支持文本、图片、音频

easeaico commented 8 months ago

你想要什么功能或者有什么建议？ 支持多模态对话功能：对话的问题和回复都支持文本、图片、音频。随着官方 ChatGPT 多模态的推出，期待未来 ChatGPT-Next-Web 有计划支持多模态的对话输入输出。

有没有可以参考的同类竞品？ 官方 ChatGPT 多模态的功能

Issues-translate-bot commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically.

Title: [Feature] Support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio

What features do you want or have any suggestions? Support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio. With the launch of official ChatGPT multi-modality, we look forward to ChatGPT-Next-Web planning to support multi-modal conversation input and output in the future.

Are there any similar competing products that we can refer to? Official ChatGPT multi-modal functionality

chenminmin4 commented 8 months ago

希望支持多模态对话功能：对话的问题和回复都支持文本、图片、音频。

Issues-translate-bot commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically.

Hope to support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio.

super999 commented 8 months ago

希望支持生成图片，语音，

Issues-translate-bot commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically.

Hope to support generating pictures, voices,

gutenye commented 8 months ago

GPT-4 Turbo with vision https://openai.com/blog/new-models-and-developer-products-announced-at-devday

xiaosatai commented 8 months ago

希望支持多模态对话功能：对话的问题和回复都支持文本、图片、音频。同时支持文件上传，服务器缓存文件功能。

Issues-translate-bot commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically.

Hope to support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio. It also supports file upload and server caching file functions.

mountainguan commented 8 months ago

github许愿池

Issues-translate-bot commented 8 months ago

Bot detected the issue body's language is not English, translate it automatically.

github wishing pool

DirkSchlossmacher commented 8 months ago

@Yidadaa would you appreciate any work on this (e.g. PoC) that you can leverage? Or are you in the middle of a major refactoring of the related app architecture?

Avey777 commented 7 months ago

多模态太重要了, 会带来无限可能.

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

Multimodality is so important and will bring endless possibilities.

ikexue commented 7 months ago

Multi-modal support is desired: support for external file uploads and server caching, and support for conversations and replies in text, images, audio, and video.

xcatliu commented 7 months ago

gpt-4-vision-preview 无法显示完整对话，估计是 openai 的 bug。需要传 max_token: 4096

可以参考这里：feat: 支持 gpt-4-vision-preview

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

gpt-4-vision-preview cannot display the complete dialogue, which is probably an openai bug. Need to pass max_token: 4096

You can refer here:

https://github.com/xcatliu/Chatgpt-sxt/commit/64e893BC09BCFA5B62BAD461A488CBDCF1 #Diff-C92AE8BA73287976525D66897 EC86C7DA0A555C871123A0DEABA2F6R170

yinbc commented 7 months ago

希望支持语音对话的功能。

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

Hope to support voice conversation function.

zhuozhiyongde commented 7 months ago

Same.

jqjhl commented 7 months ago

希望增加dell-a，还有对图像的支持。

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

I hope to add dell-a and support for images.

H0llyW00dzZ commented 7 months ago

希望增加dell-a，还有对图像的支持。

Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)

DirkSchlossmacher commented 7 months ago

希望增加dell-a，还有对图像的支持。

Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)

Isn't "own storage" what the app already has built-in: an Upstash Redis DB integration - now for chat messages backup, but could be extended for persisting images, at least if downscaled

H0llyW00dzZ commented 7 months ago

希望增加dell-a，还有对图像的支持。

Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)

Isn't "own storage" what the app already has built-in: an Upstash Redis DB integration - now for chat messages backup, but could be extended for persisting images, at least if downscaled

for image

In scenarios where an image is generated by DALL-E, the response is a JSON containing only the image URL, along with the date, time, and revised prompts (specifically for DALL-E 3). Since the image URL has a limited download duration, it would be beneficial if the image URL from DALL-E models could be automatically downloaded and then uploaded to a storage solution that allows for image display in markdown format.

gutenye commented 7 months ago

同类竞品: OpenCat

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

Competing products: OpenCat

lph66152137 commented 7 months ago

希望增加DALL·E模型用于生成图片对话支持多模态，支持图片音频文档

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

It is hoped that the DALL·E model can be added to generate pictures. The dialogue supports multi-modality and supports pictures, audio documents.

vual commented 7 months ago

已实现dall-e-3画图、gpt4-vision-preview识图、whisper语音转文字、tts文字转语音 https://github.com/vual/ChatGPT-Next-Web-Pro

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

My side supports the gpt4-vision-preview image recognition function, you can check it out. https://github.com/vual/ChatGPT-Next-Web-Pro

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

Drawing and audio functions will be added later.

gly3609 commented 7 months ago

同求,还希望能有联网功能

Issues-translate-bot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically.

Same request, I also hope to have networking function

boromyr commented 7 months ago

+1

kilgrims commented 7 months ago

pls

dckill commented 7 months ago

+1

liang2kl commented 7 months ago

+1

sunsky89757 commented 7 months ago

+1

vual commented 7 months ago

已实现dall-e-3画图、gpt4-vision-preview识图、whisper语音转文字、tts文字转语音 https://github.com/vual/ChatGPT-Next-Web-Pro

daidi commented 6 months ago

any update?

qinguangxu commented 5 months ago

大佬,能出个多模态的功能吗,

Issues-translate-bot commented 5 months ago

Bot detected the issue body's language is not English, translate it automatically.

Boss, can you come up with a multi-modal function?

Ssttar commented 4 months ago

桌面应用貌似不支持上传图片

Issues-translate-bot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically.

The desktop application does not seem to support uploading images.

ChatGPTNextWeb / ChatGPT-Next-Web

[Feature] 支持多模态对话功能：对话的问题和回复都支持文本、图片、音频 #3110