comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
42.67k stars 4.51k forks source link

Simple changes to massively simplify ComfyUI in basic use-cases #1132

Open oxysoft opened 11 months ago

oxysoft commented 11 months ago

Hi, I see that ComfyUI is getting a lot of ridicule on socials because of its overly complicated workflow. Users are now starting to doubt that this is really optimal. When I see the basic T2I workflow on the main page, I think naturally this is far too much. 7 nodes for what should be one or two, and hints of spaghetti already!! This cannot be taken lightly, this is a drastic realization.


Let me paint a picture: I was recently stranded at a bar in Montréal around 1 AM. Already I had trouble managing the zippers in my bag, and now I had to get home. I downloaded Uber for the first time, and it turns out that the app is a lot like ComfyUI: a labyrinth of bad mobile UI screens. Well you see the problem is that I was very tired and high and drunk. Eventually I managed to make it home, but now right before I pass out at 3 AM I have a sudden spark of creativity, and when I get to work I'm greeted with this monstrosity:

comfyui_screenshot

As an artist, I represent your target demographic, and it's imperative the UX be designed as if the user could only use 1% of their brain at any time.


Thankfully, just a few tweaks massively improve the situation and make ComfyUI much more comfy to use for most people:

  1. Make the save node optional and add a toggle to automatically save all unused latent or image outputs.
  2. Allow wiring latents to image inputs and implicitly VAE decode
  3. You could drastically reduce spaghetti by holding state along traversal and automatically resolving inputs that have nothing passed in, for example the VAE loaded with the model could be inferred when the node is connected above. For example in the image above, you wouldn't have to pipe VAE from Load Checkpoint to VAE Decode. The way this would work is you would drag a connection from the header of the node (you can already see a dot next to the name, imagine that on the right side of the header as well) so it carries all of the inputs/outputs and auto-assigns them by name, and does so by overriding in a dictionary.
  4. Eliminate the Empty Latent Image and make it implicit when no latent is provided.
  5. Detect implicit globals to present when there is missing data. In the overwhelming majority of cases, the user doesn't need to load more than one checkpoint. This also solves a problem created by making the Empty Latent Image: if you remove these two nodes, ComfyUI would detect the following implicit globals to assign: ckpt_name, width, height, batch_size.

To a node purist, these seem like a misuse of the node-based approach to software UX, but in the context of a sane user workflow these are simple compromises that help manage the complexity and overhead, much like syntax sugar in programming languages. I hope you will consider these changes, I think everyone would be extremely excited to hear about it!

coreyryanhanson commented 11 months ago

Honestly most of this sounds like a solution that a set of third party custom nodes that convolute tasks together can accomplish. They don't add any functionality. They are just specific instances of convenience.

As a user who just found this software this week, I think I totally could put together that node setup at 3am while drunk. It's just a question of practice.

I would hate to see it remove customization capabilities and introduce possible workflow unpredictability and friction in adapting to new model architectures in the name of making it easier to novices to node workflows. There is a diversity of programs out there, there is no reason for this program to cater to everyone. It is impossible to do so effectively.

However maybe other possible optimizations can bridge that gap and make everyone happy. (ability to group nodes into a condensed single node, fluidity to import sets of nodes without overriding your existing setup)

I won't claim a dictatorship of the target demographic, but felt like this is a necessary counterpoint to be made after seeing this style of persuasion.

qudix commented 11 months ago

I agree with coreyryanhanson, and personally a node based system is inherently complex and having too much implicit hidden control flow is a bad idea. I prefer the approach from #931 which would allow creating all-in-one nodes while still allowing the user to see under the hood.

ltdrdata commented 11 months ago

I think implicit empty latent is good idea. Now ComfyUI supports optional input. So, if latent isn't provided. We can assume it is empty latent.

Oh one thing is problem.. The latent size.

This should be custom node.

melMass commented 11 months ago

To me it sounds like subgraphs and/or Workflow Component (https://github.com/ltdrdata/ComfyUI-Workflow-Component) are more suited for users that want less nodes

ColorfanXP commented 11 months ago

I'm still trying to figure out what so comfy about it. Its very complex, but I'm slowly figuring it out.. I think..

fuami commented 11 months ago

This sort of thing feels like a bad idea as someone who has used many piece of software where "magic" or "automatic" behaviors have been introduced. And it can often lead to confusion, and misunderstandings.

I've only been using ComfyUI for like a week coming from Automatic1111 prior, and one of my favorite features about this tool that it, it isn't doing a lot of the "magical" things that that tool did to try and "just make things work". I was constantly frustrated with some of the things A1111 did and I never knew why certain things worked, while others didn't. Sometimes it worked perfectly, sometimes it just seemed to ignore everything I put in the prompt.

I constantly found myself baffled by how things did, or didn't influence the output.

After using ComfyUI for a few days I started to understand that the over simplification of A1111 gave me a lot of bad misconceptions about how some of the concepts and software works under the hood.

I think that if the tool starts introducing hidden, and magic behaviors it will lose some of the qualities that are presently self explanatory.

If people are concerned about a few nodes and connections I'm not sure that ComfyUI makes sense for them, if the basic default workflow is daunting, are you really looking for the flexibility the tool provides?

asagi4 commented 11 months ago

If you want a nicer UI for ComfyUI, try out comfybox. It allows you to hide the graph behind a customized UI that is easier to use.

I have a custom workflow that uses AITemplate and on-demand upscaling with the tile controlnet; it's so much better than A1111 that I don't really have a reason to use A1111 anymore. I can get ~14 it/s with my 3060 (512x512) using AITemplate compared to ~8 in A1111; ComfyUI is almost twice as fast as A1111 at small gens. HR fix not having to regenerate the latent every time I choose to upscale a smaller generation also saves quite a lot of time.

I think ComfyUI can definitely still do a lot to improve on UI and UX, but it'll never be (nor should it try to be) a black-box "easy" solution like A1111.

tildebyte commented 11 months ago

My opinion (no association with this project).

This is a power-user tool. If you want monkey-go-click, use auto-1111.

If you want complete understanding of the process of how all of this works, and/or complete control, you're in the right place.

The purpose of Comfy (and other tools which use a node-based workflow model - Nuke, Cycling '74 Max, etc.) is exactly to be a low-level visual programming tool comprised of easily composable atomic nodes - anything which introduces illogical or unintuitive "magic" evaluation breaks the entire model.

There are already custom nodes available (as mentioned above) which allow for saving composed nodes as a new "atom" (and sub-graphs/reusability are certainly part of this paradigm - again, Max immediately comes to mind), but which do NOT "break the model".

theandreburmet commented 11 months ago

I think we should take inspiration from what already exists for graphic designers. I work in VFX and I use Nuke and Houdini. They're excellent interfaces. Choosing a nodal interface makes it easier to access the code thanks to the nodes. I find it incredible that ComfyUI exists. However, one point where the user experience could evolve is the ability to split the interface in two, with the graphic at the bottom and the final image at the top. As in Nuke, this would avoid having to go back and forth in the graphic. There's also the option of adding connection points, hiding wires or making them straight. Nuke-12 2 -overview 6999d724250d8f9d4af03efabe366f66cb5b6f78 cover-1618233594 pasted image 0

melMass commented 11 months ago

You can do a few of these:

Now regarding nuke's Viewer I don't think that would be doable since the data is not "live" in comfy, but it could be done a bit like https://github.com/pythongosssss/ComfyUI-Custom-Scripts#show-image-on-menu

ColorfanXP commented 11 months ago

Yeah I was thinking about that earlier this week. I've used Maya and blender, so I'm familiar with constructing node based shaders. However, in those apps,, the nodes are more of a tool to build a component. The interface of the rest of the app is pretty traditional. Would be great if there was a way to have a view where you can have a split viewport for different things. Like an output image view, prompt view, sampler settings sidebar, etc. Shuffling around in comfy and going from prompt to image to whatever node gets really tedious.

coreyryanhanson commented 11 months ago

If there were to be a viewer for the output image, I'd prefer the implementation having it set to whatever node the user chooses.

Blackmagic fusion has a somewhat elegant way of doing this where you click on a node and with a single keyboard shortcut you can assign one of two viewers to it. I think something like that would be more preferable to a generic viewer for "the output image" because someone's workflow could involve multiple save image nodes. Rather than have the machine guess which one, it would be better to have the user determine it.

But at the same time many of these nodes don't have any form of visual associated with it. I don't think simple preview windows are easily translatable in this setting the more that I think of it.

tildebyte commented 11 months ago

hiding wires

Somewhere out there is a set of "broadcast/receive" nodes (can't remember which repo off-hand) which could benefit those with wire allergies

0xdevalias commented 7 months ago

A few seemingly related issues I came across in my exploration today:

airtonix commented 3 months ago

it's not the workflow/pipes that are the problem.

it's discovering and picking nodes to add. a simple dropdown is horrible UX.

Nodes should at a minimum require at least:

Then an interface like https://github.com/pacocoursey/cmdk can be used to provide better discoverability of nodes.

melMass commented 3 months ago

@airtonix Yeah, especially if you install a bunch of extensions. I was suggested SideBar to solve that, kind of a solution in the meantime: https://github.com/Nuked88/ComfyUI-N-Sidebar

tildebyte commented 3 months ago

EDIT: Gotta say, though, that Sidebar is pretty slick

[1] e.g. example - the 'CATEGORY' string can be a '/' separated list of submenus, e.g. "comfy/samplers/my_thing", or "my_nodes_collection/image/foo"