glibsonoran / Plush-for-ComfyUI

Custom node for ComfyUI/Stable Diffustion
GNU General Public License v3.0
154 stars 15 forks source link
anthropic anthropic-claude chatgpt comfyui comfyui-nodes groq groq-api openai openai-api stable-diffusion

Plush-for-ComfyUI


10/16/2024 @9:24pm PST Version 1.21.18.1

10/14/2024 @5:28pm PST Version 1.21.18

Style Prompt: Takes your: Text prompt, your image, or your text prompt and image, and the art style you specify and generates a prompt from ChatGPT3 or 4 that Stable Diffusion and/or Dall-e can use to generate an image in that style.

Advanced Prompt Enhancer: Take your: Prompt, Instruction, image, Examples and generates text output which can be a prompt or other output (e.g. caption). This node can be used with certain Open Source LLM front-ends (e.g. LM Studio) or with ChatGPT.

OAI Dall_e 3: Takes your prompt and parameters and produces a Dall_e3 image in ComfyUI.

Switch Nodes: Allows you to handle multiple multiline text inputs

Exif Wrangler: Extracts Exif and/or AI generation workflow metadata from .jpg (.jpeg) and .png images.


Installation:

Install through the ComfyUI manager:

Follow the link to the Plush for ComfyUI Github page if you're not already here.

Click on the green Code button at the top right of the page. When the tab drops down, click to the right of the url to copy it.

alt text

Then navigate, in the command window on your computer, to the ComfyUI/custom_nodes folder and enter the command by typing git clone and pasting the url you copied after it:

git clone https://github.com/glibsonoran/Plush-for-ComfyUI.git.

cd Plush-for-ComfyUI/

python -m pip install -r requirements.txt

Requirements:

Your OpenAI API or Open Source Key [optional] (Not required for Exif Wrangler, switch nodes, or Advanced Prompt Enhancer when used with open-source LLM's):

  • For the Style Prompt and Dall-e nodes, you’ll need a valid API key from OpenAI.
  • For Advanced Prompt Enhancer, you'll need a valid API key if you're going to use it with ChatGPT, Anthropic or Groq models, if you're only using it with open-source LLM's, you won't need one.
  • Some Open-source products use a free key for security and privacy so you have the option to create one if you want. Most of these products don't use a key, so don't worry if you don't have one.
  • The OpenAI API & Anthropic keys require a paid account, if you want to use an Open-source key they are typically free. The Groq API key is free also. Generate the key from their website.

The follwing table lists the Enviroment Variables that Plush recognizes and how the API keys they contain are applied.

Enviroment Variable Anthropic Groq OpenAI ChatGPT Open Source (e.g. Tabby API)
OAI_KEY X
OPENAI_API_KEY X
LLM_KEY X
GROQ_API_KEY X
ANTHROPIC_API_KEY X

How to Setup Your Environment Variables

An environment variable is a variable that is set on your operating system, rather than within your application. It consists of a name and value. For a paid ChatGPT key you can set the name of the variable to: OAI_KEY or OPENAI_API_KEY. If you're using an Open-source product that requires or can use a key (most do not), or a remote serivce that's not preconfigured, use the environment variable: LLM_KEY. Refer to the table above for other services. The example below only refers to 'OAI_KEY' but you can substitute the environment variable name that applies to you per the table above.

Note that after you set your Enviroment Variable, you will have to reboot your machine in order for it to take effect.

Windows Set-up

Option 1: Set your ‘OAI_KEY’ Environment Variable via the cmd prompt with admin privileges.

Run the following in the cmd prompt, replacing with your API key:

setx OAI_KEY (your key)

You can validate that this variable has been set by opening a new cmd prompt window and typing in

echo %OAI_KEY%

Option 2: Set your ‘OAI_KEY’ Environment Variable through the Control Panel

  1. Open System properties by right clicking the windows startup button and selecting "System". Then select Advanced system settings

  2. Select Environment Variables...

  3. Select New… from the User variables section(top). Add your name/key value pair ('OAI_KEY/'jk-####'), replacing (yourkey) with your API key.

Variable name: OAI_KEY Variable value: (yourkey)

In either case if you're having trouble and getting an invalid key response, per the instructions above, please try rebooting your machine.

Linux / MacOS Set-up

Option 1: Set your ‘OAI_KEY’ Environment Variable using zsh

  1. Run the following command in your terminal, replacing yourkey with your API key.

echo "export OAI_KEY=(yourkey)" >> ~/.zshrc

  1. Update the shell with the new variable:

source ~/.zshrc

  1. Confirm that you have set your environment variable using the following command.

echo $OAI_KEY

The value of your API key will be the resulting output.

Option 2: Set your ‘OAI_KEY’ Environment Variable using bash

Follow the directions in Option 1, replacing .zshrc with .bash_profile.

You’re all set! Now Plush can load your key when you startup ComfyUI.


How to connect to OpenRouter

You can connect to remote AI services that are not preconfigured in Advanced Prompt Enhancer (APE) by following the steps below:

1) Obtain an API key from the service you want to use, you may have to pay for this.

2) If you know how to create environment variables, create one named: LLM_KEY and enter your API key. If you don't know how to create an enviroment variable there are instructions here

3) Open the text file: .../ComfyUI/custom nodes/Plush-for-ComfyUI/opt_models.txt Follow the instructions in the comment header and enter the names of the AI models you want to use. Make sure you use the exact model names the service requires for their API, copy and paste them if possible. They should have a web page that shows these names, OpenRouter's is here. Save the text file.

4) Start ComfyUI. In the APE node you can setup your connection to the service two different ways:

- By choosing: *OpenAI API Connection (URL)* in the AI_service pull down
- By choosing: *Direct Web Connection (URL)* in the AI_service pull down

5) Select the model you want to use in the Optional_models pull down, these will be the models you entered in the text file in step 3.

6) Enter the url for the site you want to connect to in the LLM_URL field. The OpenAI API Connection method will require a url that has a /v1 path. The Direct Web Connection method will require a url that has a /v1/chat/completions path. The following are examples for OpenRouter:

- **OpenAI API Connection:** LLM_URL = `https://openrouter.ai/api/v1`
- **Direct Web Connection:** LLM_URL = `https://openrouter.ai/api/v1/chat/completions` 

7) Connect a ShowText|pysssss node to the troubleshooting output of the APE node, then go ahead and run your workflow. If you have any issues the troubleshooting output should help you diagnose the problem.


More Requirements:

Usage:

I reccommend starting off using Style Prompt with a full SDXL Base and Refiner model, these models have the depth and labeling of art styles and artists that works well with this node. You'll find a Workflow image in the custom_nodes/Plush-for-ComfyUI/Example_workflows directory. If you want a quick setup, drag this image directly onto your ComfyUI workspace in your browser, it will automatically load the graph. The new OpenDalle model model is also reccomended. Style Prompt doesn't work well with quick print/turbo workflows like LCM that rely on low cfg values. Stable Diffusion has to implement the whole (or most) of a fairly detailed prompt in order to get the right style effect, and these workflows just don't pick everything up. At least initially I recommend you use the more basic SDXL workflows and models

New to Style Prompt is the ability to interpret images and convert them into Stable Diffusion prompts using the new ChatGPT vision model. You will be using the "gpt-4-vision-preview" model if you decide to use an image in your input, regardless of your GPTmodel selection. It's the only model that can handle image input.

You can use this feature to:


StylePrompt

Style Prompt:

Inputs:

prompt: Your prompt, it doesn’t need to be wordy or complex, simpler prompts work better.

image (optional): Attach a "load image" or other node with an image output here. The image will be interpreted by ChatGPT and formulated into a prompt for Stable Diffusion. You can include an image alone, or an image + prompt. In the latter case both the prompt and image will be interprted by ChatGPT. When an image is included for interpretation, Style Prompt will automatically use the OpenAI "Vision" model (gpt-4-vision-preview) instead of the model selected in the "GPTmodel" field. This is because it's the only ChatGPT model that will accept image input.

example (optional): A text example of how you want ChatGPT’s prompt to look. There’s a default example in Style Prompt that works well, but you can override it if you like by using this input. Examples are mostly for writing style, it doesn’t matter if they pertain to the same subject as your prompt.


Outputs:

CGPTprompt: The prompt ChatGPT generates for your image, this should connect to the CLIP node. Alternatively you can have a text display node either in-line between Style Prompt and the CLIP node, or as a separate branch off this output. In either case a text display node will show you the ChatGPT generated prompt.

CGPTInstruction (optional): This will show you the instruction that was sent to ChatGPT along with the prompt. The instruction tells ChatGPT how to treat the prompt. It’s pretty much the same every time so typically it’s not worth hooking up after you’ve seen a couple.

Style Info (optional): If the style_info UI control is set to “true”, this will output a brief backgrounder on the art style you’ve chosen: This will display important characteristics of the style, its history and the names of some artists who have been influential in that style. This will require connecting it to a text display box if you’re going to use it.

Help: Hook up a text display node to this output and press the Queue button to see a brief help file that explains the functions of the UI Input elements.


UI inputs:

GPTModel (default gpt-4): The ChatGPT model that’s going to generate the prompt. GPT-4 works better than GPT-3.5 turbo, but 3.5 is slightly cheaper to use. The new GPT-4Turbo is now included.

Creative_lattitude (default 0.7): This is very similar to cfg in the KSampler. It’s how much freedom the AI model has to creatively interpret your prompt, example and instruction. Small numbers make the model stick closely to your input, larger ones give it more freedom to improvise. The actual range is from 0.1 to 2.0, but I’ve found that anything above 1.1 or 1.2 is just disjointed word salad. So I’ve limited the range to 1.2, and even then I don’t go above 0.9.

Tokens (default 500): A limit on how many tokens ChatGPT can use in providing your prompt. Paid use of the API is based on the number of tokens used. This isn’t how many ChatGPT will use, it’s a limit on how many it can use. If you want to strictly control costs you can play around with the minimum number of tokens that will get you a good prompt. I just leave it at 500.

Style (default Photograph): This is the heart of Style Prompt. I’ve included a list of dozens of art styles to choose from and my instructions tell ChatGPT to build the prompt in a way that pertains to the chosen style. It’s ChatGPT’s interpretation of the art style, knowledge of artists that work in that style, and what descriptive elements best relate to that style that makes the node effective at depicting the various styles.

Artist (default 1, range: 0 - 3): Whether to include a “style of” statement with the name of 1 to 3 artist(s) that exemplify the style you’ve chosen. Style Prompt is better at depicting the chosen style if this is set to 1 or greater. If you don't want to include an artist, set this to 0.

prompt_style (default, Tags): Let's you choose between two types of prompts: Narrative: A prompt style that is long form creative writing with grammatically correct sentences. This is the preferred form for Dall_e. Tags: A prompt style that is terse, a stripped down list of visual elements without conjunctions or grammatical phrasing. This is the preferred form for Stable Diffusion and Midjourney.

Max_elements (default 10): The maximum number of descriptive elements for ChatGPT to include in its generated prompt. Stable Diffusion gives the highest weighting to text at the beginning of the prompt, and the weighting falls off from there. There’s definitely a point where long wordy SD prompts result in diminishing returns. This input lets you limit the length of your prompt. The range here is from 3 to 25. I think 6 to 10 works about the best.

Style_info (default false): If this is set to true, Style Prompt will send a second request to ChatGPT to provide a description of the chosen style, historical information about it, and information on some of the most influential artists in that style.

Examples:

Alt Text

Prompt: Fish-Eye lens Photograph of a joyful young woman on a bustling downtown street, her smile amplified by the distorted perspective, skyscrapers curving around her in a surreal fishbowl effect, their windows reflecting the radiant midday sun, the surrounding crowd and traffic appearing as miniature figures in the margins, parked cars stretched and skewed into bizarre shapes, the blue sky overhead warped into a swirling dome, style of Justin Quinnell.

Alt Text

Prompt: High Key Photography of a noir-era actress, vividly red lipstick, sparkling diamond jewelry, soft-focus background, luxurious fur stole, pearlescent lighting effects, dramatic high-contrast shadows, and mirrored reflections, style of Terry O'Neill.

Alt Text

Prompt: Digital Art, female portrait, abstract elements, red hair, polka dots, vibrant colors, contrast, geometric shapes, surrealism, bold makeup, dripping paint effect, large eyes, stylized features, style of Patrice Murciano, style of Aya Kato.

Alt Text

Prompt: Fantasy Art of a radiant young woman, her eyes glowing with an ethereal light, clad in a cloak of starlight, amidst a sprawling urban jungle, buildings bathed in the soft hues of twilight, with the stylized graffiti murals pulsating with arcane energy, under the watchful gaze of celestial constellations, style of Yoshitaka Amano.

Alt Text

(Dall-e3 node) Prompt: Chiaroscuro Art: female warrior, profile view, low key lighting, contrast of shadow and light, detailed battle dress, animal skins, flowing black hair, blood smeared face, victory shine, defiant wind, exhalation pose, dark stormy sky, distant lightning, triumphant spear thrust, high heels, leather wristbands, feathered necklace, steel breastplate, war paint stripes, rusted spear, cracked shield, muddy battlefield, fallen enemies, style of Mario Testino.

Alt Text

Prompt: Low Key Photograph of a young woman, her features highlighted by a single, dramatic light source, cradling a small dog in her arms, the dog's coat a play of shadow and sheen, against a backdrop of deep, impenetrable shadows, the surrounding space filled with soft whispers of darkness, the barest hint of a window barely discernible in the background, the light creating a stark contrast between subject and surrounding, style of Bill Henson.

Alt Text

(Dall-e3 node) Prompt: High Key Photograph of a sun-bleached Sonoran desert landscape, the towering silhouette of a saguaro cactus against a bright, cloudless sky, crescent shaped sand dunes under harsh midday illumination, distant mountains reduced to ghostly outlines, the play of light and shadow accentuating the textures of the desert, each grain of sand aglow, all bathed in an intense, blinding light, style of Michael Frye

Alt Text

Prompt: Fashion Sketch of a statuesque model draped in a flowing grey dress, adorned with vibrant yellow accents, posed against a minimalist white background, with sharp, angular lines defining her silhouette, a dramatic contrast of shadow and light to highlight the fabric's texture, her gaze focused and intense, emanating an air of sophistication, her enigmatic smile hinting at a story untold, and a yellow hat as the final touch, style of Hayden Williams.

Alt Text

Prompt: Biomorphic Abstraction, surreal portrait, female figure, high-contrast, oversized eyes, glossy lips, polychromatic splashes, geometric shapes, dripping paint, monochromatic background, stylized features, sharp shadows, dynamic composition, style of Kandinsky, style of Joan Miró.

Alt Text

(Dall-e3 node) Prompt: Long Exposure Photograph capturing a solitary blue sailboat with its sails fully unfurled, gliding over a smooth, glass-like sea under the ethereal glow of a full moon. The image is framed to emphasize the stark contrast between the deep, velvety blues of the night sea and the subtle, shimmering silver path created by the moonlight. The sailboat is positioned slightly off-center, sailing towards the right, inviting the viewer's gaze to follow its journey. The surrounding darkness envelops the scene, with the moon's reflection acting as the main source of illumination, creating a serene yet mysterious atmosphere. The composition is minimalist, focusing on the interplay of light and shadow, the texture of the sailboat against the liquid mirror of the sea, and the infinite horizon merging sea and sky into one. Style of Michael Kenna.

Alt Text

Prompt: Art Deco of a poised young woman in a sleek, geometrically patterned dress, her sharp silhouette highlighted against a jeweled sunset, standing on the crest of a manicured grassy hill, her eyes glinting with reflected urban skylines composed of streamlined skyscrapers in the distance, her hands softly clutching an elegant sequined clutch and a feathered hat delicately perched on her bobbed hair, all under the radiant glow of a large, low-hanging moon, style of Tamara De Lempicka.

Alt Text

Prompt: Zulu Urban Art, detailed female portrait, half-shaved head with blonde hair, geometric patterns, bold contrasts, abstract shapes, vibrant colors, dripping paint, surreal composition, expressive eyes, red lips, polka dots, modern fashion, style of Kobra, Shepard Fairey.

Alt Text

Prompt: Origami of a poised young woman crafted from intricate, emerald-green folds standing tall on a textured, grassy hill, with a meticulously folded skyline of a bustling city in the distance, under a sweeping blue paper sky, the sun casting long, dramatic shadows, all bathed in soft, warm light, style of Robert J. Lang.

Alt Text

Prompt: Fashion Art of a sophisticated young woman standing central, adorned in an avant-garde, voluminous tulle gown cascading to the grassy hill under her feet, an ornate oversized feather hat on her head, peering into the distance with a mysterious, melancholic gaze, her body illuminated by the glowing moon, the sprawling city skyline serving as a contrasting backdrop, in the strikingly dramatic style of Alexander McQueen.

SPImagePlusPrompt

Style: Photograh, Example of Image +Prompt Input

=======


OAI Dall_e Image

Alt Text

I’m not going to go into detail about this node. The main thing is that it takes your prompt and outputs an image. Right now it’s only setup to use dall_e3 as the required input values are too different for me to include dall_e2. Dalle_e3 produces better images so I just didn’t think accommodating Dall_e2 was worth it.

You should be aware that in the API implementation Dall_e completely rewrites your prompt in an attempt to control misuse. The text of that rewritten prompt is what is produced by the Dall_e_prompt output in this node. This can create some odd results, and some prompts will generate a ComfyUI error as Dall_e reports that the prompt violates their policies. This can happen even with very benign subject matter. Right now I think the Dall_e engine built into the Edge browser gives better results than the API, but every once in a while this will produce a winner.

=======