example of sd tensorrt model and Lora/ControlNet model fusion by refit?

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.57k stars 2.1k forks source link

example of sd tensorrt model and Lora/ControlNet model fusion by refit? #3032

Open chengzihua opened 1 year ago

chengzihua commented 1 year ago

Description

I have tried to fuse the sd model and the torch model of lora/controlnet and then transfer to tensorrt, but how can I fuse the tensorrt model of sd and the lora/controlnet model in real time? Is there a sample?

zerollzeng commented 1 year ago

If you can export then in a single onnx model, I think then you are done. Please correct me if I understand wrong.

zerollzeng commented 1 year ago

Or you can build multiple engines and put them inside a cuda stream.

chengzihua commented 1 year ago

Or you can build multiple engines and put them inside a cuda stream.

I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.

BowenFu commented 1 year ago

@chengzihua you can find a refit sample at https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/python/engine_refit_onnx_bidaf

hx621 commented 11 months ago

Or you can build multiple engines and put them inside a cuda stream.

I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.

I have the same confusion. How did you solve it

BowenFu commented 11 months ago

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

hx621 commented 11 months ago

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

thanks for your reply, i will check it

lxp3 commented 11 months ago

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

BowenFu commented 11 months ago

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

lxp3 commented 11 months ago

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :）

bigmover commented 3 months ago

@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates

Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.

You can try removing the stale models after refitting from them to save some storage.

Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :）

HI @lxp3 Would you mind to share the solution about merge lora to tensorrt engine?