Open easeaico opened 8 months ago
Bot detected the issue body's language is not English, translate it automatically.
Title: [Feature] Support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio
What features do you want or have any suggestions? Support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio. With the launch of official ChatGPT multi-modality, we look forward to ChatGPT-Next-Web planning to support multi-modal conversation input and output in the future.
Are there any similar competing products that we can refer to? Official ChatGPT multi-modal functionality
希望支持多模态对话功能:对话的问题和回复都支持文本、图片、音频。
Bot detected the issue body's language is not English, translate it automatically.
Hope to support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio.
希望支持生成图片,语音,
Bot detected the issue body's language is not English, translate it automatically.
Hope to support generating pictures, voices,
GPT-4 Turbo with vision https://openai.com/blog/new-models-and-developer-products-announced-at-devday
希望支持多模态对话功能:对话的问题和回复都支持文本、图片、音频。同时支持文件上传,服务器缓存文件功能。
Bot detected the issue body's language is not English, translate it automatically.
Hope to support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio. It also supports file upload and server caching file functions.
github许愿池
Bot detected the issue body's language is not English, translate it automatically.
github wishing pool
@Yidadaa would you appreciate any work on this (e.g. PoC) that you can leverage? Or are you in the middle of a major refactoring of the related app architecture?
多模态太重要了, 会带来无限可能.
Bot detected the issue body's language is not English, translate it automatically.
Multimodality is so important and will bring endless possibilities.
Multi-modal support is desired: support for external file uploads and server caching, and support for conversations and replies in text, images, audio, and video.
gpt-4-vision-preview 无法显示完整对话,估计是 openai 的 bug。需要传 max_token: 4096
Bot detected the issue body's language is not English, translate it automatically.
gpt-4-vision-preview cannot display the complete dialogue, which is probably an openai bug. Need to pass max_token: 4096
You can refer here:
https://github.com/xcatliu/Chatgpt-sxt/commit/64e893BC09BCFA5B62BAD461A488CBDCF1 #Diff-C92AE8BA73287976525D66897 EC86C7DA0A555C871123A0DEABA2F6R170
希望支持语音对话的功能。
Bot detected the issue body's language is not English, translate it automatically.
Hope to support voice conversation function.
Same.
希望增加dell-a,还有对图像的支持。
Bot detected the issue body's language is not English, translate it automatically.
I hope to add dell-a and support for images.
希望增加dell-a,还有对图像的支持。
Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)
希望增加dell-a,还有对图像的支持。
Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)
Isn't "own storage" what the app already has built-in: an Upstash Redis DB integration - now for chat messages backup, but could be extended for persisting images, at least if downscaled
希望增加dell-a,还有对图像的支持。
Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)
Isn't "own storage" what the app already has built-in: an Upstash Redis DB integration - now for chat messages backup, but could be extended for persisting images, at least if downscaled
for image
In scenarios where an image is generated by DALL-E, the response is a JSON containing only the image URL, along with the date, time, and revised prompts (specifically for DALL-E 3). Since the image URL has a limited download duration, it would be beneficial if the image URL from DALL-E models could be automatically downloaded and then uploaded to a storage solution that allows for image display in markdown format.
同类竞品: OpenCat
Bot detected the issue body's language is not English, translate it automatically.
Competing products: OpenCat
希望增加DALL·E模型用于生成图片 对话支持多模态,支持图片 音频 文档
Bot detected the issue body's language is not English, translate it automatically.
It is hoped that the DALL·E model can be added to generate pictures. The dialogue supports multi-modality and supports pictures, audio documents.
已实现dall-e-3画图、gpt4-vision-preview识图、whisper语音转文字、tts文字转语音 https://github.com/vual/ChatGPT-Next-Web-Pro
Bot detected the issue body's language is not English, translate it automatically.
My side supports the gpt4-vision-preview image recognition function, you can check it out. https://github.com/vual/ChatGPT-Next-Web-Pro
Bot detected the issue body's language is not English, translate it automatically.
Drawing and audio functions will be added later.
同求,还希望能有联网功能
Bot detected the issue body's language is not English, translate it automatically.
Same request, I also hope to have networking function
+1
pls
+1
+1
+1
已实现dall-e-3画图、gpt4-vision-preview识图、whisper语音转文字、tts文字转语音 https://github.com/vual/ChatGPT-Next-Web-Pro
any update?
大佬,能出个多模态的功能吗,
Bot detected the issue body's language is not English, translate it automatically.
Boss, can you come up with a multi-modal function?
桌面应用貌似不支持上传图片
Bot detected the issue body's language is not English, translate it automatically.
The desktop application does not seem to support uploading images.
你想要什么功能或者有什么建议? 支持多模态对话功能:对话的问题和回复都支持文本、图片、音频。 随着官方 ChatGPT 多模态的推出,期待未来 ChatGPT-Next-Web 有计划支持多模态的对话输入输出。
有没有可以参考的同类竞品? 官方 ChatGPT 多模态的功能