enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.25k stars 1.19k forks source link

[Roadmap] DALL-E prompts #342

Open vadimkatsman opened 8 months ago

vadimkatsman commented 8 months ago

Why The reason behind the request - we love it to be framed for "users will be able to do x" rather than quick-aging hype-tech-of-the-day requests

It seems when I use "/draw" directive, my prompt is sent to ChatGPT model to generate a prompt for DALL-E. It produces random result with too many ways to get it wrong. I would prefer the choice to use chat gpt for image prompt or to send a prompt as I typed.

Concise description A clear and concise description of what you want to happen.

Ability to send image creation instructions directly to DALL-E.
Option between original (as I typed) and generated prompts for image creation.

Requirements If you can, please detail the changes you expect in UX, user workflows, technology, architecture (if not, the reviewers will do it for you)

enricoros commented 8 months ago

Hi @vadimkatsman this is already possible. Go to the Dall-E configuration options, advanced, and uncheck the "rewrite prompt" box.

Dalle by default rewrites all the prompts and big-AGI provides a mechanism that lets you send the prompt straight.

enricoros commented 8 months ago

Note: the prompt rewrite was done by default in the OpenAI servers (so not sent to chatgpt and then to DallE), but it's the OpenAI DallE API that uses chatgpt behind the scenes. By using the option above, we have a way to disable the default OpenAI DallE behavior, and it works well.

enricoros commented 8 months ago

This is the setting I'm talking about; uncheck "Better Prompt" (which OpenAI makes very difficult to circumvent) image

vadimkatsman commented 8 months ago

I assume "Improve" ON means using the default Open AI behavior. And I need to switch it off, right?

vadimkatsman commented 8 months ago

On another note, preferences are what they are - common defaults. Certain projects may require some specifics - like different image resolution.

Hence the suggestion - to be able to specify parameters during prompting / chatting - per conversation - with options be pre-populated from defaults but the actual parameters could be set for the conversation. (suggestion applies to both image and text tasks).

Playing with preferences not only usability issue but if one to change preferences for the need of a new chat, it will be almost impossible to continue the prior work in progress in other chats - since beginning of the thread would have different settings than continuation of the thread.

vadimkatsman commented 8 months ago

I followed your advise and unchecked "prompt" option for DALL-E. The expectation is the model will receive my text as opposed to GPT-generated one.

Here is my prompt: "Create an image representing a future of digital transformation helping solve employee retention issues. The image should encapsulate various solutions, as suggested in current dialogues around the topic. Visualize a progressive office space with modern technology implemented, diverse employees of various descents and genders working collaboratively, and real-time data analytics displayed on interactive screens to aid decision-making. Perhaps include digital tools such as an advanced HR management system, collaboration software, and key strategies for workplace satisfaction like flexible working hours and well-being programs. The image should not have any text labels.".

This is how it was rewritten by the generator before being sent to DALL-E: "Create an image representing a futuristic office setting indicative of digital transformation acting as a solution to employee retention crises. There is a diverse workforce involved, including an Asian female data analyst, a Middle-Eastern male software engineer, a Caucasian female HR manager, a Hispanic male graphic designer, and a South Asian female project manager. They're engaging in a collaborative work, utilizing high-tech devices and doing their jobs efficiently. Imagine an advanced HR management system and other digital tools in use, embodying the theme of collaborative software. The workspace offers key strategies for workplace satisfaction, such as flexible work schedules and wellness programs. Interactive screens showing real-time data analytics for informed decision-making are prominently displayed. Please do leave out any text labels in this image."

The fundamental problem I am working on is to remove text labels. Based on what I have learned, if DALL-E is called directly - it honors request not to use any text labels, but when it is called through gpt generated prompt it keeps adding a text labels - which is even worse than human faces - either with errors or illegible.

vadimkatsman commented 8 months ago

@enricoros,

Note: the prompt rewrite was done by default in the OpenAI servers (so not sent to chatgpt and then to DallE), but it's the OpenAI DallE API that uses chatgpt behind the scenes. By using the option above, we have a way to disable the default OpenAI DallE behavior, and it works well.

I looked few ChatGPT Pro tutorials on utilizing DALL-E 3 model. In one of tutorials, the GPT UI offered few prompts based on original user's prompt and gave a choice to select which way to instruct DALL-E model. I assume they disabled dall-e's embedded corrections of the prompt - like you are doiing with the switch in this question allows.

However, what was important from the tool:

  1. Prompt management:
    • choice between original and ChatGPT improved prompts
    • ability to chose between multiple generated prompts
    • to take improved prompt and make final manual corrections before submitting to dall-e for generation

Living without improved prompts is hard - the machine has ability to generate fairly creative prompts to instruct image generation more precise. But machine is random by design, and it hallucinates - so final editing of generated instructions should be available.

  1. Image refinement
    • ability to refer to one of generated images and request a refinement (as opposed to another random retry (I am using word "random" as independent attempt to generate an image as opposed to use already generated image as a base

In my example of prompts above the end result was decent but I needed to ask for few corrections. One of correction was to fix faces (ai generated faces sometimes mix between human and animal faces but refinement usually gets job done eventually) and another was to remove the dashboard leaving the rest of setting and people positions etc. intact. So my prompt was "using the just generated image above please remove a dashboard). The generated image was totally unexpected - instead of already generated image (office conference room, buildings around etc.) with removed element (big billboard I want to remove) it generated a black-and-white postcard from 1961 (famous postcard of the first cosmonaut) with the word "no dashboard" at the top of the card with the arrow pointing to the top (interpretation of the word "above" in the prompt). Which ultimately, had no correlation with any prompts, the context of the topic / chat in which image generation was requested, and no correlation to prior requests.

However, dall-e itself clearly supports refinements of previously generated image and context in which generation is requested.

enricoros commented 7 months ago

In the implementation pipeline. Reopening because suggestions are good.