Achieved real-time on RTX 3090 using TensorRT, reaching speeds of 30+ FPS.

KwaiVGI / LivePortrait

Bring portraits to life!

https://liveportrait.github.io

Other

13.06k stars 1.39k forks source link

Achieved real-time on RTX 3090 using TensorRT, reaching speeds of 30+ FPS. #150

Open warmshao opened 4 months ago

warmshao commented 4 months ago

My implementation: https://github.com/warmshao/FasterLivePortrait New Features:

Achieved real-time performance of LivePortrait on RTX 3090 GPU using TensorRT, reaching speeds of 30+ FPS.
Implemented conversion of LivePortrait model to ONNX format, achieving inference speed of approximately 70ms/frame (~12 FPS) using onnxruntime-gpu on RTX 3090, facilitating cross-platform deployment.
Seamlessly integrated support for native Gradio app, delivering several times faster speed and enabling simultaneous inference on multiple faces. Sample results available at: PR #105
Refactored code structure to eliminate PyTorch dependency. All models now use ONNX or TensorRT for inference.

FurkanGozukara commented 4 months ago

amazing work congrats

warmshao commented 4 months ago

amazing work congrats

thanks! The speed is truly unbelievably fast. Perhaps it can be used for some interesting applications.

galigaligo commented 4 months ago

I still need to compile onnxrruntime gpu myself, which is a bit discouraging

warmshao commented 4 months ago

I still need to compile onnxrruntime gpu myself, which is a bit discouraging

The latest onnxruntime-gpu still doesn't support grid_sample cuda, so we need build it from source. But I will upload a docker image soon, stay tuned!

juntaosun commented 4 months ago

Very good, it runs at a steady 20FPS on RTX 3080 . 👍️

juntaosun commented 4 months ago

https://github.com/user-attachments/assets/d3ce17c8-c0d5-43c6-a3b3-f8f7d195c824

shaoguowen commented 4 months ago

FasterLivePortrait.mp4

wow, cool! Are you using tensorrt or onnx?

warmshao commented 4 months ago

Very good, it runs at a steady 20FPS on RTX 3080 . 👍️ cool

warmshao commented 4 months ago

hi guys, I have uploaded an docker image that supports docker running https://github.com/warmshao/FasterLivePortrait. Please try it out. I will provide integration packages for Windows and macOS that support one-click run. Stay tuned.

vpckso commented 4 months ago

Thanks, but not working fastliveportrait-docker docker

Could you share your Dockerfile so that I can build myself?

warmshao commented 4 months ago

Thanks, but not working

Could you share your Dockerfile so that I can build myself? You can try installing pycuda yourself: pip install pycuda. Actually, I follows the readme tutorial step by step to install in the container, then commit it, there's no Dockerfile.

vpckso commented 4 months ago

nvidia-smi also failed so the image cannot be used, I must build from scratch but I got a lot of compile error when follow the readme seems some libraries version not compatible errors

shaoguowen commented 4 months ago

nvidia-smi also failed so the image cannot be used, I must build from scratch but I got a lot of compile error when follow the readme seems some libraries version not compatible

pls refer this: https://github.com/warmshao/FasterLivePortrait/issues/8

vpckso commented 4 months ago

Thanks, it works after fix libcuda.so.1 and libnvidia-ml.so.1 also need to fix scripts/all_onnx2trt.sh to retinaface_det_static.onnx and face_2dpose_106_static.onnx

3060 with official pytorch, source/s6.jpg + driving/d0.mp3: real 0m16.065s user 0m19.367s sys 0m1.738s

compiled model can speed up around 3s

TensortRT: real 0m7.773s user 0m11.793s sys 0m11.129s

warmshao commented 4 months ago

Install-free, extract-and-play Windows package with TensorRT support now available! please refer FasterLivePortrait releases, Really fast and very convenient!!!

falconwingz88 commented 3 months ago

will this work to a video target ?

warmshao commented 3 months ago

will this work to a video target ?

yes