Open WASasquatch opened 1 year ago
Are you suggesting a process where information is accumulated through each node during processing via execution instead of being extracted from a static workflow? It would be a good idea.
Are you suggesting a process where information is accumulated through each node during processing via execution instead of being extracted from a static workflow? It would be a good idea.
Something along those lines if possible. I think most the information is raw at the samplers node except for the positive/negative prompt. That might need to be decoded if not maybe passed along from the conditional nodes as hidden input that is the original raw text.
But we have prompt
(api) data and workflow
data, it'd be nice if we also had generation
data which pertain specifically to that actual image for reference without needing ComfyUI.
Example:
"generation": {
"prompt": "Comfyanonymous standing triumphantly atop a mountain of cotton candy",
"negative": "bad art, text, rock, stone",
"model": "coffeemix_v15.safetensors",
"seed": 8489221,
"steps": 20,
"cfg": 8.000,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.000
}
The issue with doing this is that there's a bunch of edge cases that need to be handled.
For example what happens if someone has a Conditioning Average or a conditioning Combine in their positive?
Well this last sample, is, let say limited, one can end up with weird ass stuff in comfyui, wih loras condition combine etc. etc. But I think there is already node name in S&R (havent used though). So maybe we on save/preview image, we can add field to be able to enter same/similar patterns as for file name, just of course longer. and Store it in meta. That way you can setup it however you want, maybe even like you shown in last example. Just need say text area for entry, to be able to enter lots of text :)
The issue with doing this is that there's a bunch of edge cases that need to be handled.
For example what happens if someone has a Conditioning Average or a conditioning Combine in their positive?
Yeah, I do get that. And I know I've done a few cases like that, but what if we're piping as a hidden field from the conditioning nodes relating to sampler using them and maybe something like a A/B or 123 list of the raw prompts. Since that's what we as humans are used to as the input for the generation anyhow.
I'm sure there is some way to take the hidden inputs from the positive/negative conditioning respectively and draw upon their raw conditioning texts and list out them.
I don't think us as humans are going to make sense of concatenated or conditioning in general, so that wouldn't be data we'd be interested in.
The nodes would add to the text list in succession, which would give you run order, too
"generation": {
"prompt": [
"Comfyanonymous standing triumphantly atop a mountain of cotton candy",
"bright pastel colored sky"
],
"negative_prompt": [
"bad art, drawing, sketch, line-art",
"text, copyright, watermark",
"rock, stones",
"embedding:easynegative"
],
"model": "coffeemix_v15.safetensors",
"seed": 8489221,
"steps": 20,
"cfg": 8.000,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.000
}
I think the main sorta point is besides even just human readability to find info for reference, it's also focused info for the generation for custom nodes, extensions, web services (like galleries, download systems, whatever focusing on comfyui or supporting it)
The issue with doing this is that there's a bunch of edge cases that need to be handled.
For example what happens if someone has a Conditioning Average or a conditioning Combine in their positive?
I also don't think this approach guarantees reproducibility, but at least it can be a compromise within human-readable limits.
Separate from the suggestion of WAS, it seems like a good idea to filter only the nodes that are actually involved in the generation on the execution pipe and encapsulate them in a workflow. Alternatively, muting the nodes that are not involved in the actual generation on the full workflow can also be useful since sharing the full workflow can be beneficial.
One thing with filtering, notes, notes are very useful. Now while experimenting with hi res sampler, I go to civit find some image that I want to sorta reproduce, and put image url in notes. So that if I save generated image, I can easily find what I was trying to reproduce. But since notes are detached from everything simple - take what's connected would loose them. Which would be sad :) I still think, it's better to have separate meta field on PNG, that can do S&R like in file name. And maybe pre-populate it with some default.
One thing with filtering, notes, notes are very useful. Now while experimenting with hi res sampler, I go to civit find some image that I want to sorta reproduce, and put image url in notes. So that if I save generated image, I can easily find what I was trying to reproduce. But since notes are detached from everything simple - take what's connected would loose them. Which would be sad :) I still think, it's better to have separate meta field on PNG, that can do S&R like in file name. And maybe pre-populate it with some default.
I'm sure it would be easy to ommit stripping notes, but at the same time, many may not want to share the notes that could pertain to secret sauces.
I am only for personal reproduction, I do not give two cents about the whole spoiled community thinking everyone can demand prompts and reproducibility from people which lends to the whole AI art has no soul, cause no one doing anything themselves. CivitAI while a good resource, has been nurturing the totally wrong type of AI community imo.
The issue with doing this is that there's a bunch of edge cases that need to be handled. For example what happens if someone has a Conditioning Average or a conditioning Combine in their positive?
Yeah, I do get that. And I know I've done a few cases like that, but what if we're piping as a hidden field from the conditioning nodes relating to sampler using them and maybe something like a A/B or 123 list of the raw prompts. Since that's what we as humans are used to as the input for the generation anyhow.
I'm sure there is some way to take the hidden inputs from the positive/negative conditioning respectively and draw upon their raw conditioning texts and list out them.
I don't think us as humans are going to make sense of concatenated or conditioning in general, so that wouldn't be data we'd be interested in.
The nodes would add to the text list in succession, which would give you run order, too
"generation": { "prompt": [ "Comfyanonymous standing triumphantly atop a mountain of cotton candy", "bright pastel colored sky" ], "negative_prompt": [ "bad art, drawing, sketch, line-art", "text, copyright, watermark", "rock, stones", "embedding:easynegative" ], "model": "coffeemix_v15.safetensors", "seed": 8489221, "steps": 20, "cfg": 8.000, "sampler_name": "euler", "scheduler": "normal", "denoise": 1.000 }
The main take-away
I think the main sorta point is besides even just human readability to find info for reference, it's also focused info for the generation for custom nodes, extensions, web services (like galleries, download systems, whatever focusing on comfyui or supporting it)
In retrospect, instead of a list, the prompts could be dictionaries with an ID of the corresponding execution chain, so that text, if need be, could be referred to the prompt dict for exactly what it comes from (and thus combined with whatever else).
IMO automatic generation of this would be very complex, and not cover all cases - lot of work, for decently uncertain results. That's why I said I think better to have some templates, and allow S&R for values in them, of course you can edit templates. Then you can setup output format to whatever you like for the whatever complex flow you have. Maybe someone is not even huge fan of JSON (not sure why :), but maybe just non technical person) and wants more free style output for his simple flows.
IMO automatic generation of this would be very complex, and not cover all cases - lot of work, for decently uncertain results. That's why I said I think better to have some templates, and allow S&R for values in them, of course you can edit templates. Then you can setup output format to whatever you like for the whatever complex flow you have. Maybe someone is not even huge fan of JSON (not sure why :), but maybe just non technical person) and wants more free style output for his simple flows.
Why do you think that? All this information is already being pipped. We just need to pipe the corresponding info that a sampler is actually using, additionally conditioning could just pipe in a tuple or something with raw and text data behind the scenes. And if it is that complex it's probably time to rethink how this works. This is something that is really just needed. To expect people to have confyUI, and to load it up and load a image to find out what it used is really frustrating. Everyone else is watermarking or embedding this information, cause it is pretty important.
We already have workflows, prompt, and people working on graphs etc, we wouldn't need "templates" and that is removed from the whole point here.
I think the fact clip embeddings are lossy is a reason the relevant info should be available in a image for reference for other platforms let alone comfy, as well as community discussion and sharing. Again we shouldn't have to share our whole workflow, especially in many cases where they probably won't work out the box due to models, plugins, embeddings, etc, just to try and get prompts used.
Complex because, comfyUI flows are basically programs, arbitrary stuff. I know I had plenty of flows, esp while experimenting, thatat there is just no way in hell it can be serialized into something readable-comprehendable. And that's not even getting into whatever custom nodes someone can have installed. Sure if you have pos neg prompt and one sempler yeah, no big deal. But then I add some weird shit like two advanced ksamplers with partial steps, maybe piping in another model in a middle. What sort of textual represenation do you imagine to write for this? Why should someone code converting node mesh into such? That will be very non trivial. That's why I'm saying, just let person write whatever, and use node param S&R in that text. That I'd find extremely useful and flexible. When I have bunch of images from similar/same flow with some tweaked params, would be very nice to see what were generation differences in those cases. Without the clutter of all the complexity, which any automated conversion would introduce anyway.
I'm not sure you understood what I mean by temples. Just some text you write, whatever format you like, even html:
<h1>%prompt.positive%</h1><br><ul><li>%sampler1.cfg%</li><li>%sampler2.modelName%</li></ul>
Or whatever, I deem important for my experiements at the moment, and whatever format I like at that point in time. And maybe some "predefined templates" for basic flows (like ones in ComfyUI examples).
Oh another point with such templates, if it has support, then we can add crazy ass feature - images have full workflow saved. Then add cli script to run certain template for given images to append data in whatever template (since it's same parsing engine anyway). Thus even for old images I could create views of what I think is improtant. Ofc potentially I can do that now but the structure is more complex, and if such "Template generation" is build it would be simpler. And most importantly it seems to already be in system, for file names at least, so to extend it to arbitrary meta fields should be extremely simple.
Oh another point with such templates, if it has support, then we can add crazy ass feature - images have full workflow saved. Then add cli script to run certain template for given images to append data in whatever template (since it's same parsing engine anyway). Thus even for old images I could create views of what I think is improtant. Ofc potentially I can do that now but the structure is more complex, and if such "Template generation" is build it would be simpler. And most importantly it seems to already be in system, for file names at least, so to extend it to arbitrary meta fields should be extremely simple.
Images already have the prompt saved to run it exactly as it was. But you are missing the entire point still.
For one, it's not that bad, and the information is readily available. If conditioning shipped with it's raw text to sampler, and sampler included it downstream to image save it's really no issue. I don't know if you have even looked at the workflows and nodes.
Two, this is about simply seeing what prompts were involved with the actual image. Not reproducibility. I don't always have access to ComfyUI so when I show colleagues I am guessing at what was actually used from my images workflow based on memory. When it could just be gathered up. No extra info that doesn't pertain to the gen. This is totally possible. The only issue is it wouldn't have your conditioning encoded and mixed, but that isn't the point, and almost irrelevant to just needing to reference parts of something for someone.
This isn't about civitai sharing and people just copypasta. It's about workflows in a business, and professional use.
For example, my style prompts and subject prompts are separate and would be easily disintguishable and referenceable. But in full workflow, which of 10 styles was used? What character was used? Etc etc.
Downstream data in networks like this is very common, often for this very purpose, to carry prudent linear execution data downstream to other nodes or exportation. Nodes themselves should probably have a downstream buffer dict for all sorts of customization and data carriage anyhow.
It even looks like this data could be piped out of the encoding of text to as is from how it's returning within two lists, and an empty dict.
def encode(self, clip, text):
return ([[clip.encode(text), {}]], )
...
def encode(self, clip, text):
return ([[clip.encode(text), {"raw":text}]], ) #or whatever
I want this feature too. Here is how I'm thinking I'd do it
positive_prompt
, negative_prompt
, cfg
, seed
...ComfyUI workflows can be arbitrary but this is at least something for getting data back into a workflow if your prompts all deal with the highly common use cases of adding conditioning, seed, sampler, etc.
I have spent some more time thinking of how to format this and here's what I've come up with
Taking this webui prompt:
I converted it into this custom interchange format:
The structured parameters format I imagined contains all the information about the graph inputs that the backend should receive. There is no specification for how they should be used, only hints in the ^meta
field for the semantic type of the prompt input. For example positive and negative conditioning are split into two separate conditioning nodes in ComfyUI. The user could tag each node indicating if it's positive or negative conditioning. When the parameters are loaded the graph can be searched for a compatible node with the same inputTypes
tag to copy the input to
This format complements the existing workflow format. If all you want is to recreate the exact state of a ComfyUI graph, all you'd need to do is reload the workflow .json
that already exists. However, if you want to send data between workflows then that's where this format comes into play. It lets workflows speak to each other in the same language. They can declare "I take conditioning of positive
and negative
types" and another workflow can say "here's some output, it has positive
and negative
for you". This is something that can't be described with just the arguments for the graph executor alone, since positive
and negative
aren't traits of the conditioner nodes themselves, only how they're used by the graph author
Also notice how in the parameters there are two k_sampler
entries specified. One is the first step txt2img, the second is the result of webui's HR fix that automatically upscales it as part of the same execution. In the new format these are separated so they can be relocated to the appropriate ComfyUI nodes
"conditioning": [ { "^meta": { "types": [ "positive" ] }, "text": "1girl, tinkerer, workshop, gears, magnifying mask, focused, ((looking away)), vibrant, sublime, idyllic," }, { "^meta": { "types": [ "negative" ] }, "text": "(worst quality, low quality:1.3), (depth of field, blurry:1.2), (bad_prompt:0.4)" } ],
could one keyword be in both positive
condition and negative
condition at the same time?
could one keyword be in both positive condition and negative condition at the same time?
It could if the user connects the node like this
But this isn't something that ComfyUI would detect and tag the parameter as $meta: { types: ["positive", "negative"] }
automatically, it would be up to the user to say the prompt should be used in this way
Also a small issue I came across with when thinking of this format, is how it will support custom nodes. Some nodes can load more than one type like the efficiency nodes. When the parameters are serialized for use with other workflows, there should be some way for those nodes to indicate what parameters they add
I think adding this in the frontend would be cumbersome because then custom node authors have to write converters in the frontend for each of their nodes in the backend. Instead I think the backend nodes should return this data
class CustomLoraLoader:
"""
LoRA node that sets both weights to be the same
"""
@classmethod
def INPUT_TYPES(s):
return {"required": {"lora": ("string", ),
"strength": ("float", {"default": 0, "min": 0, "max": 1.0}),
}}
def STD_PARAMS(lora, strength):
return {
"lora": [
{
"model_name": lora,
"model_hashes": {
sha256: get_lora_sha256(lora),
}
"strength_unet": strength,
"strength_tenc": strength,
}
]
}
Notice here that the number of output fields in the parameters do not correspond to the number of inputs. It's a big reason why I don't think you can just serialize the inputs to the graph like for the /prompt
endpoint and have the complete parameters ready. For LoRA models ComfyUI only receives the filepath of the model to load. But when reproducing prompts it's important that the model hash for the LoRA used matches, not the filename. So when the parameters are serialized the backend has to return the LoRA's hash if it's to be included
With this design there will also need to be a way to convert parameters back into graph inputs ([lora.model_name, lora.model_hash] -> lora_path
), but I haven't thought of what it should look like yet. Should the backend support "adapters" that define functions/API endpoints that support converting inputs like [lora.model_name, lora.model_hash]
to lora_path
? Or maybe the internal LoRA loading code can support extra data in annotated filepaths like the hash of the model instead of the actual model filepath. I'm not really sure yet.
Thinking about this more
There could be two different usecases for dropping files/workflows onto the program. One is the load the entire workflow and node graph. The other is to load just the parameters into the existing graph without changing any structure
So a prompt format with just the parameters would be useful in preserving the workflow graph's layout and connections
Thinking about this more
There could be two different usecases for dropping files/workflows onto the program. One is the load the entire workflow and node graph. The other is to load just the parameters into the existing graph without changing any structure
So a prompt format with just the parameters would be useful in preserving the workflow graph's layout and connections
I could see that useful for loading old conditioning prompts for the same workflow, where you have fine-tuned a bunch of settings.
I am also interested in this. Or at least some way of embedding the data i find interesting so i can parse it with other softwares
I am adding support for ComfyUI in my image manager, but I am currently facing some issues. I believe that if this feature request can be implemented, it would be of great help to my project.
I have already implemented some parts of it. If anyone is interested, please take a look at https://github.com/zanllp/sd-webui-infinite-image-browsing/issues/202.
Hi everyone,
I started porting One Button Prompt to ComfyUI. Not having the prompt saved anywhere in the image, means you have to get WASasquatch components to store the prompt in a file somewhere, seperate from the image.
I am not looking for anything fancy, but it would be nice to have at least 2 text input fields (positive and negative prompt) on the Save Image node. This way you can connect any text node to them, and store them in the image somewhere.
Hey all,
as this seems to be the place where this topic has the most replies/thoughts on the matter i wanted to add some points here. I'm myself fairly new to comfy but started tinkering with a custom node in the last couple of days.
I wanted to solve a problem I and many others seem to have, which is to retrieve parameters from an already generated image to use the same prompt for upscaling (as an example). As i learned during this process this is quite a challenge for comfy generated images. I found a open source app, which solves this by traversing all possible paths through the graph, but fails on a lot of occasions. Most of the reasons are discussed above.
Looking for a solution or some ideas i found a lot of people on reddit and some issues here on github missing an easy way to access the actual parameters used for the image generation.
As comfy shifts into focus for more and more people i would add my vote for this to be a big improvement.
Are there any thoughts on adding this, or not adding this? Just asking to get an indication as this thread started already 4 month ago.
Thanks for hearing/reading my thoughts.
I made a custom node ComfyScript that is able to translate workflows to human-readable Python scripts like this:
model, clip, vae = CheckpointLoaderSimple('v1-5-pruned-emaonly.ckpt')
conditioning = CLIPTextEncode('beautiful scenery nature glass bottle landscape, , purple galaxy bottle,', clip)
conditioning2 = CLIPTextEncode('text, watermark', clip)
latent = EmptyLatentImage(512, 512, 1)
latent = KSampler(model, 156680208700286, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1)
image = VAEDecode(latent, vae)
SaveImage(image, 'ComfyUI')
Though my main purpose is to make it easy to compare and reuse different parts of my workflows, it is possible to expand this to more use cases:
Run the script with some stubs to get any wanted information
For example, to get all positive prompt texts, one can define:
positive_prompts = []
def CLIPTextEncode(text, clip):
return text
def KSampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise):
positive_prompts.append(positive)
And use eval()
to run the script (stubs for other nodes will be automatically generated). This way, Reroute
, PrimitiveNode
, and other special nodes won't be a problem stopping one from getting the information.
It is also possible to generate a JSON by this. However, since JSON can only contain tree data and the workflow is a DAG, some information will have to be discarded, or the input have to be replicated at many positions.
Directly run the script to generate images
The main advantage of doing this is being able to mix Python code with ComfyUI's nodes, like doing loops, calling library functions, and easily encapsulating custom nodes. This also makes adding interaction simple since the UI and logic can be both written in Python.
The main limitation is that we cannot get the output of nodes from Python before running the full workflow. But if #931 is someday merged, this limitation can be solved, and it will be possible to use ComfyUI just like a simple Python library.
I made a custom node ComfyScript that is able to translate workflows to human-readable Python scripts like this:
model, clip, vae = CheckpointLoaderSimple('v1-5-pruned-emaonly.ckpt') conditioning = CLIPTextEncode('beautiful scenery nature glass bottle landscape, , purple galaxy bottle,', clip) conditioning2 = CLIPTextEncode('text, watermark', clip) latent = EmptyLatentImage(512, 512, 1) latent = KSampler(model, 156680208700286, 20, 8, 'euler', 'normal', conditioning, conditioning2, latent, 1) image = VAEDecode(latent, vae) SaveImage(image, 'ComfyUI')
Though my main purpose is to make it easy to compare and reuse different parts of my workflows, it is possible to expand this to more use cases:
Run the script with some stubs to get any wanted information For example, to get all positive prompt texts, one can define:
positive_prompts = [] def CLIPTextEncode(text, clip): return text def KSampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise): positive_prompts.append(positive)
And use
eval()
to run the script (stubs for other nodes will be automatically generated). This way,Reroute
,PrimitiveNode
, and other special nodes won't be a problem stopping one from getting the information. It is also possible to generate a JSON by this. However, since JSON can only contain tree data and the workflow is a DAG, some information will have to be discarded, or the input have to be replicated at many positions.- Directly run the script to generate images The main advantage of doing this is being able to mix Python code with ComfyUI's nodes, like doing loops, calling library functions, and easily encapsulating custom nodes. This also makes adding interaction simple since the UI and logic can be both written in Python. The main limitation is that we cannot get the output of nodes from Python before running the full workflow. But if Node Expansion, While Loops, Components, and Lazy Evaluation #931 is someday merged, this limitation can be solved, and it will be possible to use ComfyUI just like a simple Python library.
This looks cool in general. I saw something else that did similar but wasnt so flushed out. More a one off parser.
I wouldn't hold your breath on that PR. It's old, and includes a lot of trivial stuff instead of focusing on the framework fixes itself and doing extensions in another PR. This is a issue with other PRs that fix common issues or make ComfyUI work better. They often include a lot of personal-choice decisions in nodes and stuff that shouldn't have ever even been included.
The workflow exporting is really cool, not knocking that at all, but maybe there should be a more straight forward way to distinguish what was actually used in the saved images generation.
Currently, you get the entire workflow. every single CLIPTextEncode whether in use or not. From an end user stand-point of just trying to examine the image itself, it's not clear what was actually used in the saved images generation. Both the workflow, and the prompt list out CLIPTextEncode nodes that were not even used in the generation process for the saved image, but were part of the chain.
I propose we add some specific metadata that pertains to the actual image itself, passed along from the samplers, not just the entire workflow. I don't always want to fire up ComfyUI (or have access to it), and load up a workflow (especially when I may need to then save my current workflow before loading) just to grab a prompt I am referencing or someone is asking about.
Subsequently I could hijack this information to create a prompt history system and prompt styles system. :P Win win. Lol
Here is an idea I drafted last night, works nicely. The idea for the auto keys was for the history list population, and when looking at the file directly.