-
### 📚 The doc issue
https://docs.vllm.ai/en/latest/models/lora.html describe the steps to load a lora model.
```
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-2-7b…
-
### Expected Behavior
1. Number of loraweights matches the number of loras applied to the generated image
2. Lora weight is updated correctly when "Reuse parameters" is pressed
3. Lora weight in …
Lawrr updated
9 hours ago
-
### System Info
GPU Name: NVIDIA A800
TensorRT-LLM: 0.10.0
Nvidia Driver: 535.129.03
OS: Ubuntu 22.04
triton-inference-server backend:tensorrtllm_backend
### Who can help?
_No response_
### I…
-
run codes below
`python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-…
-
RTX 4090 24G,
Qwen-7B-Chat
loads OK:
```
model_config = ModelConfig(lora_infos={
"lora_1": conf['lora_1'],
"lora_2": conf['lora_2'],
})
model = ModelFactory.from_huggingface(conf['b…
-
Hi, Thanks for your wonderful work.
I am struggling using my lora tuned model.
I conducted following steps
1. finetuning with lora
- Undi95/Meta-Llama-3-8B-Instruct-hf model base
- llama3 …
-
I have faced an error with the VLLM framework when I tried to inferencing an Unsloth fine-tuned LLAMA3-8b model...
### Error:
(venv) ubuntu@ip-192-168-68-10:~/ans/vllm-server$ python -O -u -m vl…
-
### Python Version
```shell
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
```
### Pip Freeze
```shell
absl-py==2.1.0
annotated-types==0.7.0
anyio==4.0.0
argon2-cffi==23.1.…
-
### Problem statement:
In the production system, there should be an API to add\\\\remove fine-tuned weights dynamically. Inference caller should not have to specify LoRA location with each call.
Cur…
-
### Your current environment
None
### How would you like to use vllm
I hope to deploy the llama3-70b model on a server with 8 3090 GPUs. When I enable the enable_lora switch, the system will defi…