How to Sync with RunPod Network Volume

Daggle24 commented 10 months ago

Hi, I'm deploying the runpod Image on GPU Cloud options of RunPod with a Network Storage. But when I terminate the pod, the data doesn't persist on the network volume. I'm not understanding how can I acchieve this. I'm looking to use this same Network Volume with all my pre-configured with ComfyUi-Manager models and custom_nodes on the serverless cloud.

Maybe this is what WORKSPACE_SYNC=true Does ? Because I set it false but runpod stucks on the sync.

robballantyne commented 10 months ago

WORKSPACE_SYNC is responsible for making this possible.

When set to true, the environments at /opt/micromamba will be moved to $WORKSPACE/environments and symlinked back to /opt/micromamba

/opt/ComfyUI and /opt/serverless will also be moved into $WORKSPACE and linked back.

This all happens before the ComfyUI interface is made available, so any additions to those directories will persist.

On Runpod, this can take a long time in some regions.

I will test to ensure this is happening correctly and let you know. There was a fault with some older builds, but it should be working properly now.

robballantyne commented 10 months ago

I have tested this by deploying to a network volume and destroying the instance.

I then started a new instance on a different machine using the same network volume.

The output of /var/log/timing_data is below

Init started: 12/15/23 08:17:19.313
Mamba sync start: 12/15/23 08:17:21.968
Mamba sync complete: 12/15/23 08:17:25.973
Opt sync start: 12/15/23 08:17:25.978
Opt sync complete: 12/15/23 08:17:26.032
Provisioning start: 12/15/23 08:17:33.788
Provisioning complete: 12/15/23 08:17:44.805
Init complete: 12/15/23 08:17:44.813
ComfyUI started: 12/15/23 08:17:45.429
(comfyui) root@4db93cd8ae0f:/workspace#

This indicates the desired behaviour as there is only a small delay in startup, although it should be faster - These disks have poor performance with small files.

Daggle24 commented 10 months ago

So, is this the normal behaviour? for me, It takes a while in this step.

I'll try with the last commits you've made and let u know.

robballantyne commented 10 months ago

Yes it's normal. It's just slow and uses more space on the volume than the local disk. I have raised this with RunPod who advised it's expected behaviour with MooseFS.

A better workflow involves disabling sync completely but storing models in $WORKSPACE/storage/stable_diffusion/[*]

This will give decent performance and avoid needing to download models on startup - The provisioning script can install nodes on startup still.

Personally, I don't use persistent storage at all. I just set up provisioning scripts to download what I need before startup but I have tried to make the image as flexible as possible.

Norsninja commented 10 months ago

How long should the sync actually take? I am trying to set this up on runpod and I am at "Preparing Comfyui..." screen and it has been waiting for the workspace mamba sync for 30min maybe. I was able to run through the process with another ComfyUI Docker on runpod, but that image did not allow ComfyUI to be updated to the latest release, which lead me to finding your project. Any help would be appreciated, thank you.

robballantyne commented 10 months ago

It depends on the region. The network storage is very slow at times, especially when dealing with small files.

The same operation on a pod volume takes under two minutes - I have raised the issue with RunPod but I don't expect it to be fixed.

It may be best to disable workspace sync. Models will still persist on the volume but nodes will need to be installed at startup. Provisioning can handle this for you

Norsninja commented 10 months ago

Thank you. I followed your instructions and disabled workplace sync, and used a network volume and I was able to load into ComfyUI. I appreciate your work and help

FyzzLive commented 10 months ago

Does anyone have an example of a provisioning script?

robballantyne commented 10 months ago

Does anyone have an example of a provisioning script?

By default the running container will download and run https://github.com/ai-dock/comfyui/blob/main/config/provisioning/default.sh

The script URL is set with the PROVISIONING_SCRIPT variable and I recommend using the linked script as a template

FyzzLive commented 10 months ago

@robballantyne Thank you! In providing my own will it replace this one or will it be in addition to this one?

robballantyne commented 10 months ago

It'll be run instead of the default so your deployment is entirely customisable - I don't include any nodes or models in the image as there are so many variables to consider so I felt the script was the best way to bring flexibility for users

FyzzLive commented 10 months ago

Okay perfect, I appreciate your work, you're a legend!

robballantyne commented 10 months ago

Happy you're finding it useful!

mogupta commented 2 months ago

@robballantyne How do we move the /opt/environment to /workspace/environment ? Custom nodes dependencies needs to be installed every time I create a new pod on runpod? even using server less I think we can just have everything installed ahead of time and not wait for the script to install all dependencies on every run ( using provisioning scripts) ?

robballantyne commented 2 months ago

@mogupta on the latest builds you can sync to a mounted volume (probably works on that platform, but not tested) by running

venv-sync comfyui

You can do this at any time, so I recommend doing it after you set up your environment the first time. It can be running in the background while you work.

On future container starts it will detect the stored environments and use them. You can also do sudo supervisorctl restart comfyui to pick up the changes immediately after sync completes.

ai-dock / comfyui

How to Sync with RunPod Network Volume #19