danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
17.39k stars 2.89k forks source link

Enhancement: Multiple Dall-e 3 generations per message #1354

Closed MoezGholami closed 7 months ago

MoezGholami commented 9 months ago

What features would you like to see added?

Having multiple Dall-e 3 images generated per message.

More details

In online communities, one of the major complaints about ChatGpt and Dall-e 3 is that the number of generated images per message has dropped from 4 to 1. This has been a major disappointment for many people (some examples 1, 2, 3, 4, 5).

As far as I can tell, there is currently no way to go back to how the things were before. The current alternatives are:

  1. Midjourney: even if we agree that Midjourney has the same level of prompt understanding like Dall-e, its interface does not allow it to be embedded in conversations with chat bots.
  2. Bing chat: Bing chat does not have the 1 (or 2) images per message constraint. It is also powered by ChatGpt and Dall-e 3. The problem is that there is no way to find and continue previous chats in Bing; also, typically, people are limited to 5 messages per conversation.
  3. Stable diffusion (and other similar models): prompt comprehension is vastly inferior to Dall-e 3. Also, the ecosystem around it is complex and rapidly changing; in other words, it doesn't just work.

Implementing the similar functionality in Librechat will not only solve all these issues, but for people who pay for the API, can significantly reduce the wait time process (less throttling for API users).

Relevant issue: #1289

Which components are impacted by your request?

No response

Pictures

No response

Code of Conduct

MoezGholami commented 9 months ago

Looking at the implementation details, I see that the Dall-e.js should use structured Langchain tools; which is not the hardest part. The more complex part is dealing with the front-end view as the current abstraction (Md based view) does not simply support it.

danny-avila commented 9 months ago

This is planned with the Assistants API integration, and I've already experimented with this and it's very possible. LibreChat will be able to generate 10 images in the time it takes ChatGPT to generate one: image 1701132107332

danny-avila commented 7 months ago

current progress on this here: https://x.com/lgtm_hbu/status/1749865914315530361?s=20

MoezGholami commented 7 months ago

@danny-avila if you have the code for the current progress somewhere, please share it. I (and I bet other people) can contribute to this issue now and we don't want to redo what you've already done (thanks btw).

fuegovic commented 7 months ago

@MoezGholami

Here it is: https://github.com/danny-avila/LibreChat/tree/assistants https://github.com/danny-avila/LibreChat/pull/1696

danny-avila commented 7 months ago

@MoezGholami

Here it is: https://github.com/danny-avila/LibreChat/tree/assistants #1696

Code is about to merged within the next 24 hours

danny-avila commented 7 months ago

Closed by https://github.com/danny-avila/LibreChat/pull/1696