ai-dock / comfyui

ComfyUI docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
Other
380 stars 142 forks source link

ComfyUI serverless worker broken after adding new nodes to network volume #77

Open simonjaq opened 1 month ago

simonjaq commented 1 month ago

Hi. I'm standing on the line: I had a serverless worker, which worked well for months. Had it set up using a pod with everything on a network volume. Today I wanted to add more custom nodes and models, fired up a pod with the network volume attached, installed new nodes and models and also updated ComfyUI and it all worked in the Pod but now the serverless worker is broken. Shouldn't it take over the whole installation from the network volume?

Here's some of the error messages:

2024-06-03T15:58:43.237222975Z   File "/runpod-volume/ComfyUI/custom_nodes/ComfyUI-Image-Tools/src/processor.py", line 4, in <module>
2024-06-03T15:58:43.237227724Z     from rembg import remove
2024-06-03T15:58:43.237232473Z ModuleNotFoundError: No module named 'rembg'
2024-06-03T15:58:43.237237222Z 
2024-06-03T15:58:43.237241972Z WARNING: Cannot import /runpod-volume/ComfyUI/custom_nodes/ComfyUI-Image-Tools module for custom nodes: No module named 'rembg'
2024-06-03T15:58:43.237245533Z WARNING: Traceback (most recent call last):
2024-06-03T15:58:43.237250283Z   File "/runpod-volume/ComfyUI/nodes.py", line 1879, in load_custom_node
2024-06-03T15:58:43.237255032Z     module_spec.loader.exec_module(module)
2024-06-03T15:58:43.237259781Z   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
2024-06-03T15:58:43.237264531Z   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2024-06-03T15:58:43.237269280Z   File "/runpod-volume/ComfyUI/custom_nodes/ComfyUI-Manager/__init__.py", line 6, in <module>
2024-06-03T15:58:43.237274029Z     from .glob import manager_server
2024-06-03T15:58:43.237278778Z   File "/runpod-volume/ComfyUI/custom_nodes/ComfyUI-Manager/glob/manager_server.py", line 18, in <module>
2024-06-03T15:58:43.237283528Z     import manager_core as core
2024-06-03T15:58:43.237288277Z ModuleNotFoundError: No module named 'manager_core'
2024-06-03T15:58:43.237294213Z 
2024-06-03T15:58:43.237297775Z WARNING: Cannot import /runpod-volume/ComfyUI/custom_nodes/ComfyUI-Manager module for custom nodes: No module named 'manager_core'
2024-06-03T15:58:43.237303712Z 
2024-06-03T15:58:43.237308461Z Import times for custom nodes:
2024-06-03T15:58:43.237313210Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/websocket_image_save.py
2024-06-03T15:58:43.237317960Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/comfy-photoshop-sd
2024-06-03T15:58:43.237323896Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/klinter_nodes
2024-06-03T15:58:43.237328646Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/ComfyUI-Inference-Core-Nodes
2024-06-03T15:58:43.237333395Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/cg-use-everywhere
2024-06-03T15:58:43.237338144Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/ComfyUI_ADV_CLIP_emb
2024-06-03T15:58:43.237342893Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/sd-perturbed-attention
2024-06-03T15:58:43.237347643Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/comfyui_ipadapter_plus
2024-06-03T15:58:43.237357141Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/ComfyUI-Manager
2024-06-03T15:58:43.237360703Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/ComfyUI-AutomaticCFG
2024-06-03T15:58:43.237366640Z    0.0 seconds: /runpod-volume/ComfyUI/custom_nodes/comfy-nodes
2024-06-03T15:58:43.237371389Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/efficiency-nodes-comfyui
2024-06-03T15:58:43.237376138Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/mikey_nodes
2024-06-03T15:58:43.237382075Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/ComfyUI_FizzNodes
2024-06-03T15:58:43.237386824Z    0.0 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/ComfyUI_tinyterraNodes
2024-06-03T15:58:43.237391573Z    0.1 seconds (IMPORT FAILED): /runpod-volume/ComfyUI/custom_nodes/ComfyUI-Image-Tools
2024-06-03T15:58:43.237396322Z    0.1 seconds: /runpod-volume/ComfyUI/custom_nodes/ComfyUI_essentials

Can you advise?

robballantyne commented 1 month ago

It should, you're right.

Are you running with variable WORKSPACE_SYNC=true for the serverless worker?

I suspect the problem is that the worker is trying to use the on-image python environment rather than the synced workspace version. That shouldn't happen so I'll look into it asap.

I'm not a fan of the workspace sync though because RunPod disks are slow and it adds more runtime completely Vs building the image with required nodes and models.

simonjaq commented 1 month ago

Hi. Thanks for your reply. I didn't try to turn WORKSPACE_SYNC=true on the serverless worker yet. Did that and it works!

I set it all up mostly following this tutorial and was assuming that WORKSPACE_SYNC has to be always off on the serverless template: https://medium.com/@iknowkungfu/setup-complex-ai-image-workflows-at-scale-using-a-fleet-of-cloud-gpus-and-comfyui-1ac9a6c4b0b3

It worked with setting up on a regular pod with sync=on for the initial setup and turning off for serverless but broke on updating the backend.

I agree that ideally I would build images including everything but since I run quite complicated workflows with many custom nodes this largely exceeds my Docker capabilities.

Thanks a lot for your great work!

simonjaq commented 1 month ago

However, I am gettin a lot of 'Server not ready after timeout (30s)' errors now. Probably will have to look into building a custom Docker image after all.

robballantyne commented 4 weeks ago

The timeout is coming from the current mechanism for switching the environment from /opt to /workspace - It's a bad design decision that I plan to work around.

Currently the /opt/micromamba directory is removed and replaced with a symlink (to /workspace/environments/...), but this can take some time on slow disks. The better solution is to have the environment always set by symlinks as there will not be a delete step.

My fault but I'll fix it as soon as I can - it's still better to build the image though because networked disks aren't great for this usecase.

Really, syncing part of image so that it can be modified throws away much of the benefits that docker brings. I understand why users want to do it, I just think it introduces more points of failure which I'd never want in a production application. I will add warnings to the documentation

simonjaq commented 4 weeks ago

Thanks. I started building a custom Docker image based on ai-dock. I made a little script, which analyzes Comfy workflows and lists the custom node Githubs.

Added them to .../layer1/init.sh

NODES=(
    "https://github.com/ltdrdata/ComfyUI-Manager"
    "https://github.com/Extraltodeus/ComfyUI-AutomaticCFG",
    "https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved",
    "https://github.com/Seedsa/Fooocus_Nodes",
    "https://github.com/chrisgoringe/cg-use-everywhere",
)

However, the custom nodes seem to not be available to the worker. Do I need to remove WORKSPACE=/runpod-volume from the environment variables? Or still copy the nodes somewhere into the docker image?