Open Gabbelgu opened 3 weeks ago
I had the same issue on a Macbook Air M2 24GB Framerate was about 2sec per frame. I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second.
Just remove these two lines in requirements.txt:
onnxruntime==1.17.1; sys_platform == 'darwin' and platform_machine != 'arm64'
onnxruntime-silicon==1.16.3; sys_platform == 'darwin' and platform_machine == 'arm64'
And add this one:
onnxruntime==1.19.2; sys_platform == 'darwin'
And performance should be a lot better
Thank you, I tried it with removing the two lines and adding the one line in the requirements.txt but it is not working for me.
@BrZHub > I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second
Can you explain how you upgraded the runtime? python -m pip install onnxruntime ==1.19.2
? I'm on an M1Pro with 16GB which is also doing about 2 frames / second. It also seems like platform_machine == arm64
would be fairly important?
@C0untFloyd Any chance you could provide some guidance here? Am happy to do some testing and add to the wiki - have got lots of time on my hands
@BrZHub > I upgraded the onnxruntime to 1.19.2 and now it does about 20 frames per second
Can you explain how you upgraded the runtime?
python -m pip install onnxruntime ==1.19.2
? I'm on an M1Pro with 16GB which is also doing about 2 frames / second. It also seems likeplatform_machine == arm64
would be fairly important?
My requirements.txt file looks like this:
--extra-index-url https://download.pytorch.org/whl/cu118
numpy==1.26.4 gradio==4.44.0 fastapi<0.113.0 opencv-python-headless==4.9.0.80 onnx==1.17.0 insightface==0.7.3 albucore==0.0.16 psutil==5.9.6 torch==2.1.2+cu118; sys_platform != 'darwin' torch==2.1.2; sys_platform == 'darwin' torchvision==0.16.2+cu118; sys_platform != 'darwin' torchvision==0.16.2; sys_platform == 'darwin' onnxruntime==1.19.2; sys_platform == 'darwin' onnxruntime-gpu==1.17.1; sys_platform != 'darwin' tqdm==4.66.4 ftfy regex pyvirtualcam
It changed onnx and onnxruntime. It installs the dependencies listed in this file when you start runMacOS.sh So it probably overrides anything you install manually using "pip install"
On the settings page I set the provider to "coreml"
If i run this test clip and swap all faces without adding any additional filters it runs an average of 11.5 FPS:
Processing clip.trim_12-39-03.mp4 took 55.71 secs, 11.52 frames/s
https://github.com/user-attachments/assets/9c7412ba-9ea3-44bb-b40d-77962e9e7005
After looking at this further and looking at CPU/GPU usage, I'm not actually sure it's using CoreML, but there is no chart to see if it is using the NPU... But upgrading the ONNX libraries did increase the performance by 5x on my machine.. (15" MacBook Air M2) So there might be more gains to make.
Many thanks. What do you have your no of execution threads set to in settings? I'm not sure if that is referring to the cpu or gpu. I've now tried editing requirements.txt as per yours but don't see a performance increase.
I also wondered if we could make use of https://pypi.org/project/onnxruntime-coreml/ somehow.
See also https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider.html
My python is pretty rusty but happy to collaborate with someone on this.
Have done a bit of digging and the following is placed in a number of files which load the models:
# replace Mac mps with cpu for the moment
self.devicename = self.plugin_options["devicename"].replace('mps', 'cpu')
My guess is that no use is being made of the GPU or at least the Metal layer. I don't have a deep enough understanding of how CoreML works to know how that all fits together
Describe the bug I won't get the GPU to get utilized on my MacBook. Other apps like LLM can utilize up to 70 GB RAM for the graphic processor.
To Reproduce Steps to reproduce the behavior: I've enabled CoreML, Max. Number of Threads = 18, GFPGAN and the other processors. Same problem with Max. Number of Threads = 3, GFPGAN and the other processors.. Same problem with Max. Number of Threads = 8, GFPGAN and the other processors..
My configuration is:
MacBook Pro 16" 2023 M3 Max 128 GB RAM Python 3.11 The rate is quite low like 1 to 2s / frames, and it mostly hangs up, not going forward for 3-5s, then recalculates to 1-2s / frames.
Details What OS are you using?
Are you using a GPU?
Which version of roop unleashed are you using? 4.3.1
Screenshots If applicable, add screenshots to help explain your problem.