Open newgrit1004 opened 9 months ago
Checking internally.
@zerollzeng, Thank you for your response.
I reproduce the error, so you can follow the process.
Clone the repository.
git clone https://github.com/newgrit1004/TensorRT.git -b refit_engine_debug --single-branch
cd TensorRT
Build the docker image for debugging refit engine.
docker build -t lora_debug .
Run the container using docker compose yaml
docker compose up -d
4. Run the fastapi server
```bash
docker exec -it lora_test_container bash
# inside the container
cd $TRT_OSSPATH/demo/Diffusion
uvicorn main:app --reload --host=0.0.0.0
Wait for downoading pytorch models and building onnx model and TensorRT engine.
docker exec -it lora_test_container bash
cd $TRT_OSSPATH/demo/Diffusion python client.py
Then check the three outputs in output folder.
These outputs are reproduced exactly what I wanted to show.
You should check the available gpu id inside the docker compose yaml and host ip inside demo/diffusion/client.py.
There are a few changes about refitting engine part inside demo/diffusion/stable_diffusion_pipeline.py
Please let me know if I am using TensorRT incorrectly.
Description
https://github.com/NVIDIA/TensorRT/tree/release/9.2/demo/Diffusion
I've been working on implementing the stable diffusion example in TensorRT 9.2, specifically focusing on refitting weights using LoRA. Firstly, I appreciate the ease of refitting weights using LoRA in the engine.
However, I'm encountering an issue when attempting to refit the engine with the original weights after applying LoRA. The problem arises from the sequence of different LoRA injections, resulting in unintended images that seem to be a combination of LoRA A and LoRA B.
The unintended images looks like the combination image of LoRA A and LoRA B.
To solve this problem, I reload the TensorRT engine then I can get the images I intended. However, this way has a bottleneck of reloading TensorRT engine for every requests when I run the server as operate.
LoRA A(image A)-> LoRA B(image B)-> LoRA A(looks like image A 0.3 image B 0.7): not properly working load engine -> LoRA A(image A) -> reload engine -> LoRA B(image LoRA B)-> reload engine -> LoRA A(image A) : working
The attached images when not properly working are below.
image A, lora: sayakpaul/sd-model-finetuned-lora-t4
image B, lora : WuLing/Genshin_Bennett_LoRA
looks like image A 0.3 image B 0.7
I tried load two engines, one engine is for refitting weights with lora and the another engine is for keeping original weights. The lines where I tried to modify is https://github.com/NVIDIA/TensorRT/blob/93b6044fc106b69bce6751f27aa9fc198b02bddc/demo/Diffusion/utilities.py#L207 Also, I try to refit twice. First time for original weights and the second time for unet weights with LoRA. [https://github.com/NVIDIA/TensorRT/blob/93b6044fc106b69bce6751f27aa9fc198b02bddc/demo/Diffusion/stable_diffusion_pipeline.py#L451]
However, it is hard to debug for me. Could you let me know the easy way to do this?? Also, I can print weight role from deserialize engine but it is hard to print weights from deserialized engine.
Environment
TensorRT Version: 9.2.0.post12.dev5
NVIDIA GPU: NVIDIA A30, GPU memory: 24.0 GB
NVIDIA Driver Version: 515.105.01
CUDA Version: V12.1.105
CUDNN Version: 8.9.3
Operating System: Linux, Distribution: Linux-5.4.0-149-generic-x86_64-with-glibc2.35
Python Version (if applicable): 3.10.6
Tensorflow Version (if applicable): X
PyTorch Version (if applicable): 2.1.0a0+b5021ba
Baremetal or Container (if so, version): Container
Relevant Files
Model link: runwayml/stable-diffusion-v1-5 https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main
Steps To Reproduce
Commands or scripts:
old LoRA format such as sayakpaul/sd-model-finetuned-lora-t4 is currently not working. https://github.com/NVIDIA/TensorRT/blob/93b6044fc106b69bce6751f27aa9fc198b02bddc/demo/Diffusion/models.py#L213
so you can modify the LoraLoader code to inject old format LoRA based on this PR. https://github.com/NVIDIA/TensorRT/pull/3595
Have you tried the latest release?: Yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):