How to use Advanced Prompt Enhancer with OpenRouter and other remote AI services that are not preconfigured.
Prompt and Image examples from the Style_Prompt and Style_Prompt + OAI Dall-e3 node(s)
11/23/2024 @7:49am PST Version 1.21.20
Advance Prompt Enhancer (APE), Retries
Help.json
10/20/2024 @5:04pm PST Version 1.21.19
New Utility nodes:
Advance Prompt Enhancer (APE), Context Output
Advanced Prompt Enhancer, the old Parameters node has been removed from the Suite
Help.json
10/16/2024 @9:24pm PST Version 1.21.18.1
10/14/2024 @5:28pm PST Version 1.21.18
.../custom_nodes/Plush-for-ComfyUI/Example_Workflows/How_to_use_additionalParameters.png
.../custom_nodes/Plush-for-ComfyUI/Example_Workflows/How_to_use_additionalParameters.png
9/28/2024 @12:03pm PST Version 1.21.16
opt_models.txt
file:
opt_models.txt
that will now hold your user-entered model names.opt_models.txt
in the /Plush-for-ComfyUI directory is created during first startup after updating or installing the Plush node suite. It will not appear until after your first restarted of ComfyUI.opt_models.txt
has a comment header that explains how to enter the model names, and the Advanced Prompt Enhancer help output has information on the new field.9/26/2024 @9:31pm PST Version 1.21.15
custom_nodes/Plush-for-ComfyUI/optional_models.txt
. This is for AI Services, local or remote, that require a model name to be provided as part of the inference request.9/18/2024 @2:19pm PST Version 1.21.14
8/31/2024 @3:56pm PST Version 1.21.13
8/13/2024 @4:58pm PST Version 1.21.12
7/10/2024 @11:45am PST No version change
7/6/2024 @1:37pm PST Version 1.21.11
http://localhost:1234/v1/chat/completions
) url that appears on LM Studio's server screen when you first start the server.5/23/24 @08:16am PST Version 1.21.10
5/14/24 @04:27pm PST Version 1.21.9
v1/chat/completions
path in the url. It can be used for local applications that can't handle the OpenAI standard nested data structure. Instead this uses a flatter simplified data structure. If you're unable to connect to your local application with the other AI Service methods, especially if you're getting a server processing error in the 500 range, this might work for you.4/26/2024 @11:47am PST Version 1.21.8
4/14/2024 @02:01pm PST Version 1.21.7
/v1/chat/completions
path. This connection now includes both the Instruction (role:system) and Prompt (role:user) in the user submission to get around the problem of Oobabooga ignoring the system instruction.
3/24/2024 @12:16pm PST Version 1.21.6
/chat/completions
. For example a url for this type of connection would look like: http://127.0.0.1:5000/v1/chat/completions
. However when using this method of connection with Oobabooga TG it seems to only see the prompt and not the instruction or examples.LLM_KEY
if you want to use a key with an LLM front-end, API or other product. While these products are usually free, some use keys for security and privacy. If you want to use a key just create the env var with your key and it will automatically be applied to any connection with LLM products other than ChatGPT. If you have an OpenAI ChatGPT key in its own env var this will be unaffected, it will be used separately when you choose the ChatGPT connection type.
3/19/2024 @2:36PM PST Version 1.21.5
3/11/2024 @8:38PM PST Version 1.21.4
3/7/2024 @11:00PM PST Version 1.21.3
2/19/24 @12:20PM PST Version 1.20.3
2/13/24 @4:40PM PST Version 1.20
1/21/24 @7:09PM PST Revert some of the changes in Version 1.16
1/16/24 @1:00PM PST Version 1.16
1/8/24 @6:00pm PST Version 1.15
1/7/24 @4:07 PST
1/5/24 @12:02pm PST: Version 1.10
12/29/23 @4:24pm PST:
Style Prompt: Takes your: Text prompt, your image, or your text prompt and image, and the art style you specify and generates a prompt from ChatGPT3 or 4 that Stable Diffusion and/or Dall-e can use to generate an image in that style.
Advanced Prompt Enhancer: Take your: Prompt, Instruction, image, Examples and generates text output which can be a prompt or other output (e.g. caption). This node can be used with certain Open Source LLM front-ends (e.g. LM Studio) or with ChatGPT.
OAI Dall_e 3: Takes your prompt and parameters and produces a Dall_e3 image in ComfyUI.
Switch Nodes: Allows you to handle multiple multiline text inputs
Exif Wrangler: Extracts Exif and/or AI generation workflow metadata from .jpg (.jpeg) and .png images.
Install through the ComfyUI manager:
Manual install:
Follow the link to the Plush for ComfyUI Github page if you're not already here.
Click on the green Code button at the top right of the page. When the tab drops down, click to the right of the url to copy it.
Then navigate, in the command window on your computer, to the ComfyUI/custom_nodes folder and enter the command by typing git clone and pasting the url you copied after it:
git clone https://github.com/glibsonoran/Plush-for-ComfyUI.git.
cd Plush-for-ComfyUI/
python -m pip install -r requirements.txt
Requirements:
Your OpenAI API or Open Source Key [optional] (Not required for Exif Wrangler, switch nodes, or Advanced Prompt Enhancer when used with open-source LLM's):
- For the Style Prompt and Dall-e nodes, you’ll need a valid API key from OpenAI.
- For Advanced Prompt Enhancer, you'll need a valid API key if you're going to use it with ChatGPT, Anthropic or Groq models, if you're only using it with open-source LLM's, you won't need one.
- Some Open-source products use a free key for security and privacy so you have the option to create one if you want. Most of these products don't use a key, so don't worry if you don't have one.
- The OpenAI API & Anthropic keys require a paid account, if you want to use an Open-source key they are typically free. The Groq API key is free also. Generate the key from their website.
Enviroment Variable | Anthropic | Groq | OpenAI ChatGPT | Open Source (e.g. Tabby API) |
---|---|---|---|---|
OAI_KEY |
X | |||
OPENAI_API_KEY |
X | |||
LLM_KEY |
X | |||
GROQ_API_KEY |
X | |||
ANTHROPIC_API_KEY |
X |
You should set a reasonable $dollar limit on the usage of your OpenAI API key to prevent a large bill if the key is compromised. You can do this in the account setting at the OpenAI website.
Installation and usage of Plush-for-ComfyUI constitutes your acceptance of responsibility for any losses due to a compromised key. Plush-for-Comfy uses the OpenAI recommended security for storing your key (an Environment Variable) for your safety.
With regard to ChatGPT, you can choose to create a new Environment Variable specific to Plush called: OAI_KEY
and store the API key there, or if you prefer, you can use the OpenAI standard environment variable: OPENAI_API_KEY
.
Optionally you can create a key for Open-source products in the Environment Variable LLM_KEY
. While Open-source products are generally free to use, some use a key for security and privacy.
Plush looks for the 'OAI_KEY' variable first, if it's not there it looks for 'OPENAI_API_KEY'. Using the 'OAI_KEY' variable will allow you to generate a separate key for Plush and track those costs separately if your other OpenAI API apps are using the standard variable. Either way you'll need to have at least one of the two enviroment variables defined with a valid active key if you want to use ChatGPT as an inference engine. For Open-source products, once you populate 'LLM_KEY' with your key value, it will automatically be applied to all non-ChatGPT connections. Enviroment Variables for other supported AI services are shown in the table above.
If you need to make a new environment variable, see the following instructions on how to create it and set its value to your API key:
An environment variable is a variable that is set on your operating system, rather than within your application. It consists of a name and value. For a paid ChatGPT key you can set the name of the variable to: OAI_KEY
or OPENAI_API_KEY
. If you're using an Open-source product that requires or can use a key (most do not), or a remote serivce that's not preconfigured, use the environment variable: LLM_KEY
. Refer to the table above for other services. The example below only refers to 'OAI_KEY' but you can substitute the environment variable name that applies to you per the table above.
Note that after you set your Enviroment Variable, you will have to reboot your machine in order for it to take effect.
Option 1: Set your ‘OAI_KEY’ Environment Variable via the cmd prompt with admin privileges.
Run the following in the cmd prompt, replacing
setx OAI_KEY (your key)
You can validate that this variable has been set by opening a new cmd prompt window and typing in
echo %OAI_KEY%
Option 2: Set your ‘OAI_KEY’ Environment Variable through the Control Panel
Open System properties by right clicking the windows startup button and selecting "System". Then select Advanced system settings
Select Environment Variables...
Select New… from the User variables section(top). Add your name/key value pair ('OAI_KEY/'jk-####'), replacing (yourkey) with your API key.
Variable name: OAI_KEY Variable value: (yourkey)
In either case if you're having trouble and getting an invalid key response, per the instructions above, please try rebooting your machine.
Option 1: Set your ‘OAI_KEY’ Environment Variable using zsh
echo "export OAI_KEY=(yourkey)" >> ~/.zshrc
source ~/.zshrc
echo $OAI_KEY
The value of your API key will be the resulting output.
Option 2: Set your ‘OAI_KEY’ Environment Variable using bash
Follow the directions in Option 1, replacing .zshrc with .bash_profile.
You’re all set! Now Plush can load your key when you startup ComfyUI.
You can connect to remote AI services that are not preconfigured in Advanced Prompt Enhancer (APE) by following the steps below:
1) Obtain an API key from the service you want to use, you may have to pay for this.
2) If you know how to create environment variables, create one named: LLM_KEY
and enter your API key. If you don't know how to create an enviroment variable there are instructions here
3) Open the text file: .../ComfyUI/custom nodes/Plush-for-ComfyUI/opt_models.txt
Follow the instructions in the comment header and enter the names of the AI models you want to use. Make sure you use the exact model names the service requires for their API, copy and paste them if possible. They should have a web page that shows these names, OpenRouter's is here. Save the text file.
4) Start ComfyUI. In the APE node you can setup your connection to the service two different ways:
- By choosing: *OpenAI API Connection (URL)* in the AI_service pull down
- By choosing: *Direct Web Connection (URL)* in the AI_service pull down
5) Select the model you want to use in the Optional_models pull down, these will be the models you entered in the text file in step 3.
6) Enter the url for the site you want to connect to in the LLM_URL field. The OpenAI API Connection method will require a url that has a /v1
path. The Direct Web Connection method will require a url that has a /v1/chat/completions
path. The following are examples for OpenRouter:
- **OpenAI API Connection:** LLM_URL = `https://openrouter.ai/api/v1`
- **Direct Web Connection:** LLM_URL = `https://openrouter.ai/api/v1/chat/completions`
7) Connect a ShowText|pysssss node to the troubleshooting output of the APE node, then go ahead and run your workflow. If you have any issues the troubleshooting output should help you diagnose the problem.
You’ll need to have ComfyUI installed and it’s recommended that you have the Base and Refiner SDXL models as those are the models this node was designed for and tested on, it also seems to work really well with the new OpenDalle model. The Style Prompt node relies on having a model that has a broad set of images that have been carefully labeled with art style and artist. I think the SDXL base and refiner are best suited to this.
Plush requires the OpenAI Python library version 1.3.5 or later. This should be handled by the "requirements.txt" file included in this package. If you have used earlier nodes that communicate with ChatGPT you may have an early version of this library. If for some reason installing Plush doesn't upgrade this library, you can upgrade it manually by typing the command:
pip install --upgrade openai
in a directory or virtual environment where it will be applied to the installation of Python that ComfyUI is using.
Be aware that in some instances the new OpenAI API is not backward compatible and apps that use the older library may break after this upgrade.
I reccommend starting off using Style Prompt with a full SDXL Base and Refiner model, these models have the depth and labeling of art styles and artists that works well with this node. You'll find a Workflow image in the custom_nodes/Plush-for-ComfyUI/Example_workflows directory. If you want a quick setup, drag this image directly onto your ComfyUI workspace in your browser, it will automatically load the graph. The new OpenDalle model model is also reccomended. Style Prompt doesn't work well with quick print/turbo workflows like LCM that rely on low cfg values. Stable Diffusion has to implement the whole (or most) of a fairly detailed prompt in order to get the right style effect, and these workflows just don't pick everything up. At least initially I recommend you use the more basic SDXL workflows and models
New to Style Prompt is the ability to interpret images and convert them into Stable Diffusion prompts using the new ChatGPT vision model. You will be using the "gpt-4-vision-preview" model if you decide to use an image in your input, regardless of your GPTmodel selection. It's the only model that can handle image input.
You can use this feature to:
Inputs:
prompt: Your prompt, it doesn’t need to be wordy or complex, simpler prompts work better.
image (optional): Attach a "load image" or other node with an image output here. The image will be interpreted by ChatGPT and formulated into a prompt for Stable Diffusion. You can include an image alone, or an image + prompt. In the latter case both the prompt and image will be interprted by ChatGPT. When an image is included for interpretation, Style Prompt will automatically use the OpenAI "Vision" model (gpt-4-vision-preview) instead of the model selected in the "GPTmodel" field. This is because it's the only ChatGPT model that will accept image input.
example (optional): A text example of how you want ChatGPT’s prompt to look. There’s a default example in Style Prompt that works well, but you can override it if you like by using this input. Examples are mostly for writing style, it doesn’t matter if they pertain to the same subject as your prompt.
Outputs:
CGPTprompt: The prompt ChatGPT generates for your image, this should connect to the CLIP node. Alternatively you can have a text display node either in-line between Style Prompt and the CLIP node, or as a separate branch off this output. In either case a text display node will show you the ChatGPT generated prompt.
CGPTInstruction (optional): This will show you the instruction that was sent to ChatGPT along with the prompt. The instruction tells ChatGPT how to treat the prompt. It’s pretty much the same every time so typically it’s not worth hooking up after you’ve seen a couple.
Style Info (optional): If the style_info UI control is set to “true”, this will output a brief backgrounder on the art style you’ve chosen: This will display important characteristics of the style, its history and the names of some artists who have been influential in that style. This will require connecting it to a text display box if you’re going to use it.
Help: Hook up a text display node to this output and press the Queue button to see a brief help file that explains the functions of the UI Input elements.
UI inputs:
GPTModel (default gpt-4): The ChatGPT model that’s going to generate the prompt. GPT-4 works better than GPT-3.5 turbo, but 3.5 is slightly cheaper to use. The new GPT-4Turbo is now included.
Creative_lattitude (default 0.7): This is very similar to cfg in the KSampler. It’s how much freedom the AI model has to creatively interpret your prompt, example and instruction. Small numbers make the model stick closely to your input, larger ones give it more freedom to improvise. The actual range is from 0.1 to 2.0, but I’ve found that anything above 1.1 or 1.2 is just disjointed word salad. So I’ve limited the range to 1.2, and even then I don’t go above 0.9.
Tokens (default 500): A limit on how many tokens ChatGPT can use in providing your prompt. Paid use of the API is based on the number of tokens used. This isn’t how many ChatGPT will use, it’s a limit on how many it can use. If you want to strictly control costs you can play around with the minimum number of tokens that will get you a good prompt. I just leave it at 500.
Style (default Photograph): This is the heart of Style Prompt. I’ve included a list of dozens of art styles to choose from and my instructions tell ChatGPT to build the prompt in a way that pertains to the chosen style. It’s ChatGPT’s interpretation of the art style, knowledge of artists that work in that style, and what descriptive elements best relate to that style that makes the node effective at depicting the various styles.
Artist (default 1, range: 0 - 3): Whether to include a “style of” statement with the name of 1 to 3 artist(s) that exemplify the style you’ve chosen. Style Prompt is better at depicting the chosen style if this is set to 1 or greater. If you don't want to include an artist, set this to 0.
prompt_style (default, Tags): Let's you choose between two types of prompts: Narrative: A prompt style that is long form creative writing with grammatically correct sentences. This is the preferred form for Dall_e. Tags: A prompt style that is terse, a stripped down list of visual elements without conjunctions or grammatical phrasing. This is the preferred form for Stable Diffusion and Midjourney.
Max_elements (default 10): The maximum number of descriptive elements for ChatGPT to include in its generated prompt. Stable Diffusion gives the highest weighting to text at the beginning of the prompt, and the weighting falls off from there. There’s definitely a point where long wordy SD prompts result in diminishing returns. This input lets you limit the length of your prompt. The range here is from 3 to 25. I think 6 to 10 works about the best.
Style_info (default false): If this is set to true, Style Prompt will send a second request to ChatGPT to provide a description of the chosen style, historical information about it, and information on some of the most influential artists in that style.
Prompt: Fish-Eye lens Photograph of a joyful young woman on a bustling downtown street, her smile amplified by the distorted perspective, skyscrapers curving around her in a surreal fishbowl effect, their windows reflecting the radiant midday sun, the surrounding crowd and traffic appearing as miniature figures in the margins, parked cars stretched and skewed into bizarre shapes, the blue sky overhead warped into a swirling dome, style of Justin Quinnell.
Prompt: High Key Photography of a noir-era actress, vividly red lipstick, sparkling diamond jewelry, soft-focus background, luxurious fur stole, pearlescent lighting effects, dramatic high-contrast shadows, and mirrored reflections, style of Terry O'Neill.
Prompt: Digital Art, female portrait, abstract elements, red hair, polka dots, vibrant colors, contrast, geometric shapes, surrealism, bold makeup, dripping paint effect, large eyes, stylized features, style of Patrice Murciano, style of Aya Kato.
Prompt: Fantasy Art of a radiant young woman, her eyes glowing with an ethereal light, clad in a cloak of starlight, amidst a sprawling urban jungle, buildings bathed in the soft hues of twilight, with the stylized graffiti murals pulsating with arcane energy, under the watchful gaze of celestial constellations, style of Yoshitaka Amano.
(Dall-e3 node) Prompt: Chiaroscuro Art: female warrior, profile view, low key lighting, contrast of shadow and light, detailed battle dress, animal skins, flowing black hair, blood smeared face, victory shine, defiant wind, exhalation pose, dark stormy sky, distant lightning, triumphant spear thrust, high heels, leather wristbands, feathered necklace, steel breastplate, war paint stripes, rusted spear, cracked shield, muddy battlefield, fallen enemies, style of Mario Testino.
Prompt: Low Key Photograph of a young woman, her features highlighted by a single, dramatic light source, cradling a small dog in her arms, the dog's coat a play of shadow and sheen, against a backdrop of deep, impenetrable shadows, the surrounding space filled with soft whispers of darkness, the barest hint of a window barely discernible in the background, the light creating a stark contrast between subject and surrounding, style of Bill Henson.
(Dall-e3 node) Prompt: High Key Photograph of a sun-bleached Sonoran desert landscape, the towering silhouette of a saguaro cactus against a bright, cloudless sky, crescent shaped sand dunes under harsh midday illumination, distant mountains reduced to ghostly outlines, the play of light and shadow accentuating the textures of the desert, each grain of sand aglow, all bathed in an intense, blinding light, style of Michael Frye
Prompt: Fashion Sketch of a statuesque model draped in a flowing grey dress, adorned with vibrant yellow accents, posed against a minimalist white background, with sharp, angular lines defining her silhouette, a dramatic contrast of shadow and light to highlight the fabric's texture, her gaze focused and intense, emanating an air of sophistication, her enigmatic smile hinting at a story untold, and a yellow hat as the final touch, style of Hayden Williams.
Prompt: Biomorphic Abstraction, surreal portrait, female figure, high-contrast, oversized eyes, glossy lips, polychromatic splashes, geometric shapes, dripping paint, monochromatic background, stylized features, sharp shadows, dynamic composition, style of Kandinsky, style of Joan Miró.
(Dall-e3 node) Prompt: Long Exposure Photograph capturing a solitary blue sailboat with its sails fully unfurled, gliding over a smooth, glass-like sea under the ethereal glow of a full moon. The image is framed to emphasize the stark contrast between the deep, velvety blues of the night sea and the subtle, shimmering silver path created by the moonlight. The sailboat is positioned slightly off-center, sailing towards the right, inviting the viewer's gaze to follow its journey. The surrounding darkness envelops the scene, with the moon's reflection acting as the main source of illumination, creating a serene yet mysterious atmosphere. The composition is minimalist, focusing on the interplay of light and shadow, the texture of the sailboat against the liquid mirror of the sea, and the infinite horizon merging sea and sky into one. Style of Michael Kenna.
Prompt: Art Deco of a poised young woman in a sleek, geometrically patterned dress, her sharp silhouette highlighted against a jeweled sunset, standing on the crest of a manicured grassy hill, her eyes glinting with reflected urban skylines composed of streamlined skyscrapers in the distance, her hands softly clutching an elegant sequined clutch and a feathered hat delicately perched on her bobbed hair, all under the radiant glow of a large, low-hanging moon, style of Tamara De Lempicka.
Prompt: Zulu Urban Art, detailed female portrait, half-shaved head with blonde hair, geometric patterns, bold contrasts, abstract shapes, vibrant colors, dripping paint, surreal composition, expressive eyes, red lips, polka dots, modern fashion, style of Kobra, Shepard Fairey.
Prompt: Origami of a poised young woman crafted from intricate, emerald-green folds standing tall on a textured, grassy hill, with a meticulously folded skyline of a bustling city in the distance, under a sweeping blue paper sky, the sun casting long, dramatic shadows, all bathed in soft, warm light, style of Robert J. Lang.
Prompt: Fashion Art of a sophisticated young woman standing central, adorned in an avant-garde, voluminous tulle gown cascading to the grassy hill under her feet, an ornate oversized feather hat on her head, peering into the distance with a mysterious, melancholic gaze, her body illuminated by the glowing moon, the sprawling city skyline serving as a contrasting backdrop, in the strikingly dramatic style of Alexander McQueen.
Style: Photograh, Example of Image +Prompt Input
=======
I’m not going to go into detail about this node. The main thing is that it takes your prompt and outputs an image. Right now it’s only setup to use dall_e3 as the required input values are too different for me to include dall_e2. Dalle_e3 produces better images so I just didn’t think accommodating Dall_e2 was worth it.
You should be aware that in the API implementation Dall_e completely rewrites your prompt in an attempt to control misuse. The text of that rewritten prompt is what is produced by the Dall_e_prompt output in this node. This can create some odd results, and some prompts will generate a ComfyUI error as Dall_e reports that the prompt violates their policies. This can happen even with very benign subject matter. Right now I think the Dall_e engine built into the Edge browser gives better results than the API, but every once in a while this will produce a winner.
=======