-
### 🚀 The feature, motivation and pitch
There are huge potential in more advanced load balancing strategies tailored for the unique characteristics of AI inference, compared to basic strategies such …
-
### Your current environment
```text
vllm 0.5.4
```
### 🐛 Describe the bug
1. start vllm server `python -m vllm.entrypoints.openai.api_server --served-model-name qwen2 --model /ai-deploy/open-m…
-
### Your current environment
```text
podman --version
podman version 5.2.3
uname -a
Linux noelo-work 6.10.12-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 30 21:38:25 UTC 2024 x86_64 GNU/L…
-
# ComfyUI Error Report
## Error Details
- **Node Type:** QuickMfluxNode
- **Exception Type:** RuntimeError
- **Exception Message:** [load_safetensors] Failed to open file /Users/wengyinghui/Comf…
-
Hi there! 🤗
`FlashLlamaForCausalLM` uses name `dense` for its MLP submodule and when user wants to employ a LoRA adapter, `get_mlp_weights` skips this submodule.
https://github.com/huggingface/t…
-
I compared two ways to launch the server.
The model is vicuna-7b, and GPU is 2 \* A30.
and the 1st way is
```
python -m vllm.entrypoints.openai.api_server \
--model /data/models/vicuna-…
-
### 📚 The doc issue
请问 使用lmdeploy serve api_server THUDM/chatglm2-6b --adapters mylora=chenchi/lora-chatglm2-6b-guodegang
启动服务之后,调用的时候能不能即使用裸模型也使用lora训练后的?
比如openai方式调用的时候model_name=mylora就是调用adpte…
-
Hi,
I installed version 2024.11.16 on a board Heltec Lora 32 V2 and the LCD is not working although it is configured.
`"display": { "alwaysOn": true, "timeout": 5, "turn180": false },`
![2024-1…
-
As stated in your paper, the server distributes the stacked global LoRA module to each client, but how each client convert this global module into the local module with a lower rank?
-
### Issue Description
xyz search and replace with only x activated, works fine, i.e. it applies lora , then
Prompt is:
```
photo of man on the street
comic book style ,
```
…