Gourieff / comfyui-reactor-node

Fast and Simple Face Swap Extension Node for ComfyUI
GNU General Public License v3.0
1.32k stars 127 forks source link

GPU not being used with inswapper-only CPU #361

Closed ckao10301 closed 3 weeks ago

ckao10301 commented 1 month ago

First, confirm

What happened?

When running face swap on a video, my CPU spikes to 90-100% usage in the "Analyzing target image" and "Swapping..." phases.

Why is CPU usage so high?Shouldn't this be using the GPU instead (driver is installed properly)? If CPU is critical, what kind of CPU and RAM would allow me to run reactor the fastest? Is multithread or higher frequency more important? @Gourieff

Steps to reproduce the problem

Your workflow video test workflow.json

Sysinfo

mint linux chrome rtx 3090 1950x threadripper ddr4 Screenshot from 2024-07-15 02-10-21

Relevant console log

na

Additional information

No response

0002kgHg commented 1 month ago

same problem

ckao10301 commented 1 month ago

I already followed the instructions and ran install.py, but my instance isn't using GPU at all during analyzing image and swapping images phase. So seems like the inswapper model is being run only on CPU. How do I fix this?

CUDA 12 Support - don't forget to run (Windows) install.bat or (Linux/MacOS) install.py for ComfyUI's Python enclosure or try to install ORT-GPU for CU12 manually (https://onnxruntime.ai/docs/install/#install-onnx-runtime-gpu-cuda-12x)

Amit30swgoh commented 1 month ago

same problem :(

ckao10301 commented 1 month ago

@Gourieff is this expected behavior or is it an issue with CUDA or python version?

Gourieff commented 1 month ago

@ckao10301 could you please show your pip list? The best compatibility/performance at the moment is to use Py3.10/3.11 + Cu11.8

ckao10301 commented 1 month ago

My Python is 3.10.12. Looks like CUDA is 12.2. I can't figure out how to downgrade to 12.2-do you have any advice? I'm on mint linux nvcc command doesn't show anything. Do I even have it installed correctly?

from nvidia-smi. Driver Version: 535.183.01 CUDA Version: 12.2

(venv) @.***:~/Desktop/ComfyUI$ nvcc Command 'nvcc' not found, but can be installed with: sudo apt install nvidia-cuda-toolkit

Driver Version: 535.183.01 CUDA Version: 12.2

(venv) @.***:~/Desktop/ComfyUI$ pip list Package Version


aiofiles 24.1.0 aiohttp 3.9.5 aiosignal 1.3.1 albucore 0.0.12 albumentations 1.4.11 annotated-types 0.7.0 anyio 4.4.0 apt-clone 0.2.1 apturl 0.5.2 async-timeout 4.0.1 attrs 23.2.0 beautifulsoup4 4.10.0 blessings 1.7 blinker 1.4 boto3 1.34.32 botocore 1.34.144 Brlapi 0.8.3 Brotli 1.0.9 certifi 2020.6.20 cffi 1.16.0 chardet 4.0.0 click 8.0.3 cmake 3.30.0 colorama 0.4.4 coloredlogs 15.0.1 command-not-found 0.3 configobj 5.0.6 contourpy 1.2.1 cryptography 3.4.8 cstr 0.1.0 cupshelpers 1.0 cycler 0.12.1 Cython 3.0.10 dbus-python 1.2.18 defer 1.0.6 Deprecated 1.2.14 distro 1.7.0 easydict 1.13 einops 0.8.0 eval_type_backport 0.2.0 exceptiongroup 1.2.2 executing 2.0.1 eyeD3 0.8.10 fairscale 0.4.13 ffmpy 0.3.0 filelock 3.6.0 flatbuffers 24.3.25 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2024.6.1 gitdb 4.0.11 GitPython 3.1.43 googleapis-common-protos 1.63.2 gpustat 0.6.0 h11 0.14.0 httpcore 1.0.5 httplib2 0.20.2 httpx 0.27.0 huggingface-hub 0.23.4 humanfriendly 10.0 idna 3.3 ifaddr 0.1.7 imageio 2.34.2 imageio-ffmpeg 0.5.1 IMDbPY 2021.4.18 img2texture 1.0.6 importlib_metadata 7.1.0 insightface 0.7.3 jeepney 0.7.1 Jinja2 3.1.4 jmespath 1.0.1 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2023.12.1 keyring 23.5.0 kiwisolver 1.4.5 kornia 0.7.3 kornia_rs 0.1.5 launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loader 0.4 llvmlite 0.43.0 logfire 0.46.1 louis 3.20.0 macaroonbakery 1.3.1 Mako 1.1.3 markdown-it-py 3.0.0 MarkupSafe 2.0.1 matplotlib 3.9.1 matrix-client 0.4.0 mdurl 0.1.2 more-itertools 8.10.0 mpmath 1.3.0 multidict 6.0.5 mutagen 1.45.1 nemo-emblems 6.0.1 netaddr 0.8.0 netifaces 0.11.0 networkx 3.3 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py3 7.352.0 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.0 onboard 1.4.1 onnx 1.16.1 onnxruntime 1.18.1 onnxruntime-gpu 1.18.1 openai 1.36.0 opencv-python 4.10.0.84 opencv-python-headless 4.7.0.72 opentelemetry-api 1.25.0 opentelemetry-exporter-otlp-proto-common 1.25.0 opentelemetry-exporter-otlp-proto-http 1.25.0 opentelemetry-instrumentation 0.46b0 opentelemetry-proto 1.25.0 opentelemetry-sdk 1.25.0 opentelemetry-semantic-conventions 0.46b0 packaging 21.3 PAM 0.4.2 pandas 2.2.2 pexpect 4.8.0 piexif 1.1.3 pilgram 1.2.1 Pillow 9.5.0 pip 24.1.2 platformdirs 4.2.2 pooch 1.8.2 prettytable 3.10.2 protobuf 4.25.3 psutil 5.9.0 ptyprocess 0.7.0 py-cpuinfo 9.0.0 pycairo 1.20.1 pycparser 2.22 pycryptodomex 3.11.0 pycups 2.0.1 pycurl 7.44.1 pydantic 2.8.2 pydantic_core 2.20.1 pyelftools 0.27 Pygments 2.18.0 PyGObject 3.42.1 PyICU 2.8.1 pyinotify 0.9.6 PyJWT 2.3.0 pymacaroons 0.13.0 PyMatting 1.1.12 PyNaCl 1.5.0 pyparsing 2.4.7 pyparted 3.11.7 pyRFC3339 1.1 python-apt 2.4.0+ubuntu3 python-dateutil 2.9.0.post0 python-debian 0.1.43+ubuntu1.1 python-dotenv 1.0.1 python-gnupg 0.4.8 python-magic 0.4.24 python-xlib 0.29 pytz 2022.1 pyxdg 0.27 PyYAML 5.4.1 qrcode 7.3.1 referencing 0.35.1 regex 2024.5.15 rembg 2.0.57 reportlab 3.6.8 requests 2.25.1 requests-file 1.5.1 rich 13.7.1 rpds-py 0.19.0 s3transfer 0.10.2 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.5.1 scipy 1.14.0 screen-resolution-extra 0.0.0 seaborn 0.13.2 SecretStorage 3.3.1 segment-anything 1.0 sentencepiece 0.2.0 setproctitle 1.2.2 setuptools 59.6.0 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 soundfile 0.12.1 soupsieve 2.3.1 spandrel 0.3.4 sympy 1.13.0 systemd-python 234 threadpoolctl 3.5.0 tifffile 2024.7.2 timm 1.0.7 tinycss2 1.1.1 tldextract 3.1.2 tokenizers 0.19.1 tomli 2.0.1 torch 2.3.1+cu121 torchaudio 2.3.1+cu121 torchsde 0.2.6 torchvision 0.18.1+cu121 tqdm 4.66.4 trampoline 0.1.2 transformers 4.42.4 triton 2.3.1 typing_extensions 4.12.2 tzdata 2024.1 ubuntu-drivers-common 0.0.0 ufw 0.36.1 ultralytics 8.2.57 ultralytics-thop 2.0.0 Unidecode 1.3.3 urllib3 1.26.5 wadllib 1.3.6 wcwidth 0.2.13 webencodings 0.5.1 websockets 9.1 wheel 0.37.1 wrapt 1.16.0 xdg 5 xkit 0.0.0 xlrd 1.2.0 yarl 1.9.4 yt-dlp 2022.4.8 zipp 1.0.0

On Fri, Jul 26, 2024 at 1:50 AM Евгений Гурьев | Eugene Gourieff | 古仁 < @.***> wrote:

@ckao10301 https://github.com/ckao10301 could you please show your pip list? The best compatibility/performance at the moment is to use Py3.10/3.11 + Cu11.8

— Reply to this email directly, view it on GitHub https://github.com/Gourieff/comfyui-reactor-node/issues/361#issuecomment-2252272728, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6462DQDDU65ATVMSWKK7MDZOIEXBAVCNFSM6AAAAABK4ESZQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGI3TENZSHA . You are receiving this because you were mentioned.Message ID: @.***>

webfiltered commented 1 month ago

It is not expected behaviour.

Instead of downgrading, I upgraded everything. Resolved on Windows using (latest @ post date where not specified):

Troubleshooting:

Below is possibly just misdirection, ping me if I should remove it, but in case it helps:

scripts/reactor_swapper.py and scripts/r_faceboost/restorer.py both set providers = [...]. I configured these explicitly per onnxruntime examples: providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]

I also changed the onnx logging levels in reactor_patcher.py and reactor_utils.py: onnxruntime.set_default_logger_severity(3) -> log level 1/2.

ckao10301 commented 1 month ago

@webfiltered I followed your steps and they worked! Face swapping is waaay faster now. Using comfyui manual install method rather than comfyui portable with the embedded python (don't know if it makes a difference). Running on windows 11. got it to work on linux too. Thank you so much!

webfiltered commented 1 month ago

No worries!

There's a workaround, too, if anyone can't use those versions. May be Windows only. Ensure you run Comfy in the foreground (e.g. via a terminal / cmd / powershell window), and just alt + tab to that window after you queue a prompt.

For me at least, it was 30x+ faster than with the console hidden.. and may actually run faster than CUDA. Worth testing, if you're doing large batches.

ckao10301 commented 1 month ago

Oh interesting, why would making the terminal that's running comfy active make it run faster?

On Sun, Jul 28, 2024 at 12:55 AM filtered @.***> wrote:

No worries!

There's a workaround, too, if anyone can't use those versions. May be Windows only. Ensure you run Comfy in the foreground (e.g. via a terminal / cmd / powershell window), and just alt + tab to that window after you queue a prompt.

For me at least, it was 30x+ faster than with the console hidden.. and may actually run faster than CUDA. Worth testing, if you're doing large batches.

— Reply to this email directly, view it on GitHub https://github.com/Gourieff/comfyui-reactor-node/issues/361#issuecomment-2254382605, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6462DQFY44HYXZNU2T4KMDZOSPVZAVCNFSM6AAAAABK4ESZQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM4DENRQGU . You are receiving this because you were mentioned.Message ID: @.***>

webfiltered commented 1 month ago

Short answer is UX. This is probably an incomplete answer, and based on the magnitude of difference, also inaccurate.

The OS has *almost no idea that your Comfy browser tab is actually a local program, so the Comfy process will not receive priority. The thing which the user is interacting with is the most important thing, for an end user OS.

*Yes it probably has some idea and figuring it out is ridiculously simple, but this is an edge case at present, so expect to work around the limitation manually.

alperc84 commented 1 month ago

Yes, I had the same problem. The GPU was only used in the restoring stage. In other stages, the GPU usage was zero and the processes were very slow. My processor was reaching 94C degrees. applied @webfiltered solution step by step in comfyui portable version fixed the problem. Now all stages using very high gpu which is increased speed maybe 10x. I used cuda 12.4 for solution btw. Thank you.