-
Same as https://github.com/vllm-project/vllm/issues/182#issuecomment-1627176207
-
I'm having great qualitative results from Falcon finetuned with adaptersv2.
The inference is better than what I have with huggingface/peft and lora, but still slow for scaling up.
Could the idea…
-
**Is your feature request related to a problem? Please describe.**
Many models are now becoming multi-model, that is they can accept images, videos or audio during inference. The llama.cpp projec…
-
### 🚀 The feature, motivation and pitch
_No response_
### Alternatives
_No response_
### Additional context
_No response_
-
I used the following code to sft llama3:
```
import os
import wandb
os.environ["WANDB_PROJECT"] = "unsloth-mimic-20240814" # name your W&B project
os.environ["WANDB_LOG_MODEL"] = "checkpoint" …
-
### Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
### Describe the bug
I tried to benchmark t…
-
### Your current environment
```text
vllm=0.5.4
```
llm = LLM(
model=MODEL_NAME,
trust_remote_code=True,
gpu_memory_utilization=0.5,
max_model_len=2048,
tensor_paralle…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
[data.zip](https://github.com/user-attachments/files/15604248/data.zip)
This is the zip file that happened this error.
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### 🐛 Describe the bug
```console
INFO: 10.244.239.34:38…