Open chengzihua opened 1 year ago
If you can export then in a single onnx model, I think then you are done. Please correct me if I understand wrong.
Or you can build multiple engines and put them inside a cuda stream.
Or you can build multiple engines and put them inside a cuda stream.
I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.
@chengzihua you can find a refit sample at https://github.com/NVIDIA/TensorRT/tree/release/8.6/samples/python/engine_refit_onnx_bidaf
Or you can build multiple engines and put them inside a cuda stream.
I need to replace different lora models in real time. If the sd model and the lora model are fused and then converted, it will take a long time, so I hope to use refit to replace the weight on the sd base model. I don’t know if I can give an example here.
I have the same confusion. How did you solve it
@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates
@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates
thanks for your reply, i will check it
@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates
Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.
@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates
Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.
You can try removing the stale models after refitting from them to save some storage.
@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates
Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.
You can try removing the stale models after refitting from them to save some storage.
Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :)
@hx621 you can check this sample and related codes. https://github.com/NVIDIA/TensorRT/tree/release/9.0/demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-using-specified-lora-model-weight-updates
Unfortunately, there is still a lack of solutions for dynamic Lora fusion. A feasible but heavy workaround is to fit the Lora weight/bias before exporting to ONNX, similar to the sample and related code implemented here. Each export operation will result in saving a new, large model. Considering the arbitrary combinations of different LoRa modules, which puts significant pressure on storage.
You can try removing the stale models after refitting from them to save some storage.
Yes, thank you. Btw., it will be excited to see the future implementaion in onnx/tensorrt that we can merge Lora as a plug-in module like pytorch flexibly :)
HI @lxp3 Would you mind to share the solution about merge lora to tensorrt engine?
Description
I have tried to fuse the sd model and the torch model of lora/controlnet and then transfer to tensorrt, but how can I fuse the tensorrt model of sd and the lora/controlnet model in real time? Is there a sample?