NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.16k stars 901 forks source link

Use 2 Lora in one request #2038

Open Alireza3242 opened 1 month ago

Alireza3242 commented 1 month ago

I have a base mode: model_0. I created a Lora which is corresponding to instruction tuning: lora_1 The we merged model_0 + lora_1 to create: model_1. Then we created a Lora based on model_1 for DPO: lora_2.

final model is model_0 + lora_1 + lora_2.

But in inference time i can only use only one lora in lora_config:

lora_config = trtllm.LoraConfig(task_id=task_id, weights=weights, config=config)
trtllm.Request(input_token_ids=input_ids, ..., lora_config=lora_config)

I want to set that use lora_1 and lora_2 together.

TheCodeWrangler commented 1 month ago

Can you just use your base model to be model_1? Or are you needing to call model_0 still?

Might be worth going through the matrix operations to determine if you can come up with lora_3 which is the equivalent of lora_1 + lora_2 (doesn't seem likely to me though)

Alireza3242 commented 1 month ago

Can you just use your base model to be model_1? Or are you needing to call model_0 still?

Might be worth going through the matrix operations to determine if you can come up with lora_3 which is the equivalent of lora_1 + lora_2 (doesn't seem likely to me though)

I need to call model_0. Because I may need to add other Loras. Probably lora_3=lora_1+lora_2 will solve my problem. I will try it.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."