Open cxz91493 opened 3 days ago
LoRA is not supported on V100. If you hope TRT-LLM support this feature, you can create a issue to ask this feature.
i have the same error in V100
docker run --rm --runtime=nvidia --gpus all --entrypoint /bin/bash -it nvidia/cuda:12.4.0-devel-ubuntu22.04
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I follow the steps here https://github.com/NVIDIA/TensorRT-LLM/tree/v0.10.0/examples/llama#run-llama-with-lora
Download base model and lora model from HF Base model: Llama-2-7b-hf Lora model: chinese-llama-2-lora-7b
Run on basic docker image environment
Convert model
Build engine
Run the model
""" [TensorRT-LLM] TensorRT-LLM version: 0.10.0 [06/28/2024-08:17:02] [TRT-LLM] [W] The paged KV cache in Python runtime is experimental. For performance and correctness, please, use C++ runtime. /usr/local/lib/python3.10/dist-packages/torch/nested/init.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( Input [Text 0]: "今天天气很好,我到公园的时候," Output [Text 0 Beam 0]: "sitting beside rivers surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded surrounded" """
Expected behavior
Sussessfuly get output by lora model
actual behavior
Error occured
additional notes
got tllm_checkpoint_1gpu after covert model
got trt_engines folder after build engine