NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.11k stars 896 forks source link

How to use LoRA with rank 1024+? #1903

Closed NextNextDev closed 1 month ago

NextNextDev commented 2 months ago

System Info

Hello, I'm trying to apply LoRA and getting the following error. Does anyone know if there is a way to run this?

[TensorRT-LLM][ERROR] Assertion failed: Invalid low_rank (1024). low_rank must be smaller than mMaxLowRank (64)

Who can help?

No response

Information

Tasks

Reproduction

.

Expected behavior

I expect to be able to use LoRA with a rank of 1024 or higher without encountering any errors.

actual behavior

When I attempt to use LoRA with a rank of 1024, I receive an error stating that the low_rank must be smaller than mMaxLowRank (64).

additional notes

QiJune commented 2 months ago

@byshiue Could you please have a look? Thanks

robmsmt commented 1 month ago

When you do trtllm-build you can set --max_lora_rank=256. I have used this, worth try setting to 1024.

byshiue commented 1 month ago

@robmsmt 's comment is correct. If you don't setup the max_lora_rank during building engine, the default would be 64.