Too many open files (24)

Ainaemaet commented 1 year ago

Hello,

When using this plugin I get the following error after enabling the plugin, it doesn't happen immediately on install, but comfui tab doesn't show up right away so I close and restart my WSL2 Ubuntu instance and re-launch sd-webui, then I get:

Traceback (most recent call last):
  File "/home/MYUSERNAME/anaconda3/envs/automatic/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
  File "/home/MYUSERNAME/anaconda3/envs/automatic/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
  File "/home/MYUSERNAME/anaconda3/envs/automatic/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 369, in reduce_storage
RuntimeError: unable to open shared memory object </torch_1545_703365300_501> in read-write mode: Too many open files (24)

If I then try to connect to sd-webui via browser I get a string of similar errors that goes on indefinitely (could be forever but I always Ctrl+C terminal after a minute or two).

I have the following plugins installed alongside this one: Mask2Background,SadTalker,TemporalKit,a1111-mini-paint ,a1111-sd-webui-lycoris ,a1111-sd-webui-tagcomplete ,canvas-zoom ,danbooru-prompt ,deforum-for-automatic1111-webui ,ebsynth_utility ,gif2gif ,infinite-zoom-automatic1111-webui ,model_preset_manager,multidiffusion-upscaler-for-automatic1111,sd-3dmodel-loader ,sd-canvas-editor ,sd-extension-aesthetic-scorer,sd-extension-steps-animation ,sd-webui-3d-editor,sd-webui-agent-scheduler,sd-webui-controlnet,sd-webui-infinite-image-browsing,sd-webui-llul,sd-webui-panorama-viewer ,sd-webui-regional-prompter ,sd-webui-text2video,sd-webui-txt-img-to-3d-model,stable-diffusion-NPW ,video_loopback_for_webui

Can confirm removing the ComfyUI extension folder from sd-webui/extensions returns everything to working order.

ljleb commented 1 year ago

Please share the complete stack trace, not just a small section. Does the exact message repeats itself or it is a little bit different every time? For example are the numbers in </torch_1545_703365300_501> changing?

It seems related to the way the model is shared between processes, which we are working on rewriting ATM.

I found a way to make it so that we serialize unet call parameters and return value instead of the entire state dict, although it seems to be elaborate to implement AFAICT so I don't have an ETA.

ljleb commented 1 year ago

Does this help if you merge it locally?

https://github.com/ModelSurge/sd-webui-comfyui/pull/51/files

IIUC a quick fix might be to share one key at a time and garbage collect the shared memory after making a copy in the comfyui process. It shouldn't take more memory than we are already taking.

ljleb commented 1 year ago

Also we should consider not sharing the model if the webuiCheckpointLoader node isn't used.

To achieve this we can start the consumer thread and stop it when a node is added or removed from the node editor, or something like that.

PladsElsker commented 1 year ago

Please share the complete stack trace, not just a small section. Does the exact message repeats itself or it is a little bit different every time? For example are the numbers in </torch_1545_703365300_501> changing?

It seems related to the way the model is shared between processes, which we are working on rewriting ATM.

I found a way to make it so that we serialize unet call parameters and return value instead of the entire state dict, although it seems to be elaborate to implement AFAICT so I don't have an ETA.

I tested, I have the exact same error with wsl

Creating model from config: /home/plads/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying attention optimization: xformers... done.
Textual inversion embeddings loaded(0):
Model loaded in 5.7s (load weights from disk: 1.0s, create model: 0.5s, apply weights to model: 2.3s, apply half(): 0.7s, move model to device: 0.6s, scripts callbacks: 0.5s, calculate empty prompt: 0.1s).
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
  File "/home/plads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 369, in reduce_storage
RuntimeError: unable to open shared memory object </torch_903_56158203_503> in read-write mode: Too many open files (24)

PladsElsker commented 1 year ago

One way to fix this is to execute this command before launching the webui:

ulimit -n 3000

It will increase the max amount of opened file descriptors to 3000. It's not the right way to fix the issue, but you can do this while we work on a better fix.

I think 3000 should be fine, but if it still doesn't work, you can try increasing the number even further.

Ainaemaet commented 1 year ago

Thank you @John-WL, that works well enough for now. :)

ljleb commented 1 year ago

@Ainaemaet as of the latest version of the extension, you shouldn't need the workaround anymore. Please let us know if anything does not work as intended.

ModelSurge / sd-webui-comfyui

Too many open files (24) #54