-
### Related area
Heltec LoRa ESP32 V2
### Hardware specification
esp32-s3
### Is your feature request related to a problem?
Hello,
I am experiencing difficulties connecting my Heltec LoRa ESP…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
### System Info
GPU Name: NVIDIA A800
TensorRT-LLM: 0.10.0
Nvidia Driver: 535.129.03
OS: Ubuntu 22.04
triton-inference-server backend:tensorrtllm_backend
### Who can help?
_No response_
### I…
-
This RFC proposes improvements to the management of Low-Rank Adaptation (LoRA) in vLLM to make it more suitable for production environments. This proposal aims to address several pain points observed …
-
### System Info
2X L4 GPUs
Docker Image:
nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3
### Who can help?
@juney-nvidia @kaiyux
### Information
- [ ] The official example sc…
-
run codes below
`python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-…
-
RTX 4090 24G,
Qwen-7B-Chat
loads OK:
```
model_config = ModelConfig(lora_infos={
"lora_1": conf['lora_1'],
"lora_2": conf['lora_2'],
})
model = ModelFactory.from_huggingface(conf['b…
-
Here is the development roadmap for 2024 Q3. Contributions and feedback are welcome.
## Server API
- [ ] Add APIs for using the inference engine in a single script without launching a separate se…
-
我说不清为什么,同一模型,同一底模,一模一样的提示词的情况下,用这个自定义节点与HF在线生图风格迁移的程度差距很远,请问这是为什么?另外就是长提示词会减弱LORA的效力
I can't explain why, the same model, the same bottom model, the same prompt words, the use of this custom node and…
-
Hi, Thanks for your wonderful work.
I am struggling using my lora tuned model.
I conducted following steps
1. finetuning with lora
- Undi95/Meta-Llama-3-8B-Instruct-hf model base
- llama3 …