Live-Portrait ONNX model

aihacker111 commented 2 months ago

This is the link to HuggingFace model id and Github code demo for running project with onnx model which I convert completed 5 original models

I fixed the Grid Sample 5D error and successfully converted the Warping model via ONNX. Besides, other models all work fine and fast

Live Portrait ONNX: https://huggingface.co/myn0908/Live-Portrait-ONNX Github repo for running my ONNX: https://github.com/aihacker111/Efficient-Live-Portrait

Thank you @cleardusk and your team for great project

juntaosun commented 2 months ago

Well done, awesome job! TERRIFIC! 👍👍👍

zzzweakman commented 2 months ago

Thank you very much for your contribution! Could you please provide the test results for the inference speed? @aihacker111

Sunitha-Thomas commented 2 months ago

@aihacker111 Thanks for the onnx models it worked great. But when i ran your onnx export script but not working. torch.onnx.errors.OnnxExporterError: Unsupported: ONNX export of operator GridSample with 5D volumetric input. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues this error again, how to fix?

aihacker111 commented 2 months ago

@Sunitha-Thomas It's very easy than you though, just reduce your torch version and make sure it < version 2.0.0 -> it'll work

aihacker111 commented 2 months ago

@zzzweakman Yeah, sure bro This is the process time when run full original pytorch models

This run on 354 frames with more than 1 hour And This is the process time when run full onnx models

At the same number of frames and inputs, onnx model is boosting x2 faster

Noted: I'm tested all of this on Apple Silicon M1 Pro CPU and not using CUDA , not change anything

aihacker111 commented 2 months ago

Oh for the explain of the question: Why the latest pytorch and onnx version is raising Grid Sample 5D model, I'm checking the function convert in ONNX and seem like it's a bug of Microsoft, it's block when the Convolution pass to the Converting Function -> this is a reason that cannot converted, but in the lower version Pytorch (example: Torch 1.13.0) Torch and ONNX lib is matching everything , seem like we have to back door

aihacker111 commented 2 months ago

@Sunitha-Thomas already fixed Grid Example 5D in my source code , you can pull or git clone again

juntaosun commented 2 months ago

After testing, change to CUDA mode. Only one “live_portraiet_onnx/onnx/warping.onnx” model failed.

# Enable CUDA mode
providers = ['CUDAExecutionProvider','CPUExecutionProvider'] 
self.w_session = ort.InferenceSession(self.warping, providers=providers)

It reported an error and the program aborted. [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running GridSample node. Name:'/dense_motion_network/GridSample' Status Message: Only 4-D tensor is supported

aihacker111 commented 2 months ago

@juntaosun please show me the full scripts

juntaosun commented 2 months ago

@aihacker111

Only CUDA mode is enabled, other codes are the same. like this：

    def initialized_models(self):
        providers = ['CUDAExecutionProvider','CPUExecutionProvider']  
        self.m_session = ort.InferenceSession(self.motion_extractor, providers=providers)
        self.f_session = ort.InferenceSession(self.features_extractor, providers=providers)
        self.w_session = ort.InferenceSession(self.warping, providers=providers) # failed
        self.g_session = ort.InferenceSession(self.spade, providers=providers)

        self.s_session = ort.InferenceSession(self.stitch_path, providers=providers)
        self.s_l_session = ort.InferenceSession(self.lip_path, providers=providers)
        self.s_e_session = ort.InferenceSession(self.eye_path, providers=providers)

aihacker111 commented 2 months ago

@juntaosun thank you , I'll check it later

aihacker111 commented 2 months ago

@juntaosun only enable cuda , the warping model doesn't run ? , how about CPU

aihacker111 commented 2 months ago

@juntaosun Thank you, I know where is the bug

Echolink50 commented 2 months ago

Is this working properly. How do we run it? Just swap the models out or is there more to it? Thanks

aihacker111 commented 2 months ago

@Echolink50 , git my repo , u can run it

aihacker111 commented 2 months ago

@juntaosun hey bro , can you check and help me fix the result video is reversed with original video , something mistake in the code

noname0121 commented 2 months ago

@aihacker111 , git my repo , u can run it Can you give more detailed instructions with a video? I just downloaded git clone with the address you gave and don't know what to do anymore

aihacker111 commented 2 months ago

@noname0121 I will update readme later , please wait we need to clean all the code for easy to use

noname0121 commented 2 months ago

@aihacker111 I will update readme later , please wait we need to clean all the code for easy to use

Thank you very much, have a nice day

henryruhs commented 2 months ago

@aihacker111 Author of FaceFusion here, thanks for making this public. We need more people like you in the OS scene.

aihacker111 commented 2 months ago

@henryruhs Thank you so much, I'm really like your FaceFusion repo , it help me much during my AI engineer Journey ^_^ . I'm plan to convert to TensorRT for optimizing all of this model that help everyone can run it. Thank you

aihacker111 commented 2 months ago

@juntaosun I'm checked relate to it

aihacker111 commented 2 months ago

https://github.com/user-attachments/assets/259adee9-9d04-4224-b5bb-868bbd392ba7

this output is from the option real-time webcam not list sequence frames, so clearly it's related to warping.onnx , maybe I'm setting key with wrong output, let me check

juntaosun commented 2 months ago

@aihacker111

The problem of animation reversal has been solved.

warping_network.py

def forward(self, feature_3d, kp_driving, kp_source):  # Parameter order

inference_portrait_onnx.py

def warp_decode(self, feature_3d, kp_source, kp_driving  

      # Please note that the position of the parameters is swapped to match the correct input
        ort_inputs = {
            self.w_input_names[0]: np.array(feature_3d),
            self.w_input_names[1]: np.array(kp_driving) ,  # Old: kp_source
            self.w_input_names[2]: np.array(kp_source)   # Old: kp_driving
        }

       outputs = self.w_session.run(self.w_output_names, ort_inputs)

Now the animation works! 😀😀😀

aihacker111 commented 2 months ago

@juntaosun 😂😂😂😂😂 onnx's input never ceases to surprise us as long as the arrays are shuffled

Echolink50 commented 2 months ago

IS this working on WIndows? Are you all able to use the gradio app.py from Liveportrait?

aihacker111 commented 2 months ago

Hello @Echolink50 I and @juntaosun is still cleaning the code and will update as pull request for this project , but the inference code is working, you can try it on window, macos and linux

Echolink50 commented 2 months ago

Ok. I am trying to install now. I tried to install just like Liveportrait but its saying things can't install in python 3.9 venv? Do I need to use python 3.10 venv?

aihacker111 commented 2 months ago

@Echolink50 you can use my repo for testing onnx model

aihacker111 commented 2 months ago

@juntaosun Help me testing on real-time and show the fps . Thank you

Echolink50 commented 2 months ago

I am. Still trying to install. I got error when running the requirements.txt

ERROR: Ignored the following yanked versions: 2.6, 2.6.1 ERROR: Ignored the following versions that require a different python version: 3.3 Requires-Python >=3.10; 3.3rc0 Requires-Python >=3.10 ERROR: Could not find a version that satisfies the requirement networkx==3.3

Do I need python 3.10 venv?

aihacker111 commented 2 months ago

@Echolink50 yeah, you should create env by conda , it's stable

aihacker111 commented 2 months ago

ONNX model is now working stable and faster, tomorrow I will clean and debug the code for enhance quality code and ensure adding video summarization for reduce number of frames to enhance maximum speed of inference, Beside that, I'm plan to convert task for TensorRT and OpenVino for optimizing on CUDA
This is a final ONNX model demo for this project

https://github.com/user-attachments/assets/0716a9f3-531b-4876-af2d-afe54b04e2ef

Echolink50 commented 2 months ago

Got it working. Its not using the GPU for some reason. Windows 10, RTX 2060 12GB. Says its going to take 48 minutes to run one of the demos. The main LIveportrait branch takes only a few seconds. When you post the debugged git could you also give a few basic install instructions or explain how the install differs from the main branch. Thanks

galigaligo commented 2 months ago

一切正常，感谢您的贡献！❤️❤️❤️ 它现在非常快，至少比 CUDA 上的原始代码快 3 倍！👍👍👍

s7--s7_concat.mp4

我测试了下，3060显卡，CUDA11.8 ONNX+CUDA 测试275帧 154.56s pytorch+CUDA 测试275帧 144.68s 我的怎么更慢了呢....

kitckso commented 2 months ago

一切正常，感谢您的贡献！❤️❤️❤️ 它现在非常快，至少比 CUDA 上的原始代码快 3 倍！👍👍👍

s7--s7_concat.mp4

我测试了下，3060显卡，CUDA11.8 ONNX+CUDA 测试275帧 154.56s pytorch+CUDA 测试275帧 144.68s 我的怎么更慢了呢....

I'm using 3060 too. I haven't try ONNX, but I tested 2xx frames with pytorch only takes 18s, seems your speed is abnormal.

Echolink50 commented 2 months ago

一切正常，感谢您的贡献！❤️❤️❤️ 它现在非常快，至少比 CUDA 上的原始代码快 3 倍！👍👍👍 s7--s7_concat.mp4

我测试了下，3060显卡，CUDA11.8 ONNX+CUDA 测试275帧 154.56s pytorch+CUDA 测试275帧 144.68s 我的怎么更慢了呢....

I'm using 3060 too. I haven't try ONNX, but I tested 2xx frames with pytorch only takes 18s, seems your speed is abnormal.

The size of the inputs seem to affect the speed.

x4080 commented 2 months ago

Hi, is your repo work for apple silicon too ?

kitckso commented 2 months ago

一切正常，感谢您的贡献！❤️❤️❤️ 它现在非常快，至少比 CUDA 上的原始代码快 3 倍！👍👍👍 s7--s7_concat.mp4

我测试了下，3060显卡，CUDA11.8 ONNX+CUDA 测试275帧 154.56s pytorch+CUDA 测试275帧 144.68s 我的怎么更慢了呢....

I'm using 3060 too. I haven't try ONNX, but I tested 2xx frames with pytorch only takes 18s, seems your speed is abnormal.

The size of the inputs seem to affect the speed.

Seems model input is 256x256, image will crop and resize before process, so it shouldn’t affect too much?

galigaligo commented 2 months ago

一切正常，感谢您的贡献！❤️❤️❤️ 它现在非常快，至少比 CUDA 上的原始代码快 3 倍！👍👍👍 s7--s7_concat.mp4

我测试了下，3060显卡，CUDA11.8 ONNX+CUDA 测试275帧 154.56s pytorch+CUDA 测试275帧 144.68s 我的怎么更慢了呢....

量化版本，

一切正常，感谢您的贡献！❤️❤️❤️ 它现在非常快，至少比 CUDA 上的原始代码快 3 倍！👍👍👍 s7--s7_concat.mp4

我测试了下，3060显卡，CUDA11.8 ONNX+CUDA 测试275帧 154.56s pytorch+CUDA 测试275帧 144.68s 我的怎么更慢了呢....

I'm using 3060 too. I haven't try ONNX, but I tested 2xx frames with pytorch only takes 18s, seems your speed is abnormal.

统一使用s10.jpg和d8.mp4测试，一共275帧， pytroch+cuda, 全部循环时长（包含get_kp_info、stitching、parse_output等）用时45s, 如果只计算warping_module和spade_generator用时7.76s

mtime=0 start_time = time.time() for i in track(range(n_frames), description='Animating...', total=n_frames): start_time2 = time.time() out = self.live_portrait_wrapper.warp_decode(f_s, x_s, x_d_i_new) end_time2 = time.time() mtime+= (end_time2 - start_time2) end_time = time.time() print("frams",n_frames,"all time:", end_time - start_time, "s","main time",mtime, "s")

frams 275 all time: 45.03623104095459 s main time 7.76613974571228 s

aihacker111 commented 2 months ago

Make sure you adding provider and checking provider that onnx use is CPU or GPU, please install onnxruntime-gpu fit with cuda 11.x or 12.x as well as cudnn version @juntaosun is use my model and run successfully on CUDA with boosting 3 time faster in my onnx model

Everything works fine, thanks to @aihacker111 for the contribution! ❤️❤️❤️ It's now very fast, at least 3 times faster than the original code on CUDA! 👍👍👍

s7--s7_concat.mp4

aihacker111 commented 2 months ago

@x4080 Yes it can run on apple silicon, but I’m just test on CPU M1 pro , you can try with MPS by adding CoreMLExecutionProvider in provider

x4080 commented 2 months ago

@aihacker111 Cool, i just git yout repo then python inference_inference_onnx.py right ? It seems to download all .pth files ? I thought it only use onnx ?

And how to choose quantized vs non quantized version ?

Great work man

aihacker111 commented 2 months ago

@x4080 I’m still in the clean code process, you can run on my repo for fun , I’ll update new code soon, the quantized of dynamic is better than static , but I have some issues with quantization that is slow down than original onnx and I’m still current fixing it, also I recommend you use original onnx model for better performance

x4080 commented 2 months ago

@aihacker111 Cool, no probs man

aihacker111 commented 2 months ago

@x4080 sometime , the speed is not only affect by the model you use, it related to the code process and others reason, like I said yesterday , I’ll adding video summarization into it for reduce num frames and video size for fasting inference

kitckso commented 2 months ago

@x4080 I’m still in the clean code process, you can run on my repo for fun , I’ll update new code soon, the quantized of dynamic is better than static , but I have some issues with quantization that is slow down than original onnx and I’m still current fixing it, also I recommend you use original onnx model for better performance

Do you mean the "3 times faster" is based on original ONNX model? And expect the quantized model will be faster after you fix?

galigaligo commented 2 months ago

@x4080 sometime , the speed is not only affect by the model you use, it related to the code process and others reason, like I said yesterday , I’ll adding video summarization into it for reduce num frames and video size for fasting inference

Test onnx using live_portrait_onnx/onnx/: Unified testing using s10.jpg and d8.mp4, with a total of 275 frames, Test with 3060 graphics card and CUDA11.8 def initialized_models(self): providers = ['CUDAExecutionProvider'] self.m_session = ort.InferenceSession(self.motion_extractor, providers=providers) self.f_session = ort.InferenceSession(self.features_extractor, providers=providers) self.w_session = ort.InferenceSession(self.warping, providers=providers) self.g_session = ort.InferenceSession(self.spade, providers=providers) self.s_session = ort.InferenceSession(self.stitch_path, providers=providers) self.s_l_session = ort.InferenceSession(self.lip_path, providers=providers) self.s_e_session = ort.InferenceSession(self.eye_path, providers=providers) The total time for the generate function is 153.2914435863495 seconds, while the warp code function takes 106.74874997138977 seconds

PyTorch+CUDA, the total loop duration (including get-kpyinfo, stitching, parse_output, etc.) takes 45 seconds,

If only calculating the warming_madule and spade_generationor takes 7.76s The speed is much slower than the original PyTorch version. Is the speed improvement the live_portrait_onnx/quantification/quantification version? I use live_portrait_onnx/quantification/dynamics, which is very slow, Unable to generate images properly using live_portrait_onnx/quantification/static.

Speed: frams 275 generate function: 133.769793510437 s warp code function 115.6993567466736 s

https://github.com/user-attachments/assets/285aca2f-402e-45fd-8ef8-29c219346bc1

I don't know where the problem lies

aihacker111 commented 2 months ago

@kitckso I’m not test quantization anymore, quantization converted like a options for everyone want to try, I don’t like quantization more because post-quantization maybe can make it faster but the performance not sure it good, I currently recommend everyone use original onnx

aihacker111 commented 2 months ago

@galigaligo Please use non-quantization onnx model, controll onnx quantization is seem like hard , or you can wait the next plan of mine: TensorRT after I completed on onnx task

KwaiVGI / LivePortrait

Live-Portrait ONNX model #126