-
### Your current environment
```Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubun…
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTo…
-
I followed the documentation to build the LLaMA 3 8B Instruct model with multiple LoRA versions as described in this NVIDIA blog post(https://developer.nvidia.com/zh-cn/blog/deploy-multilingual-llms-w…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Your current environment
```text
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12…
-
I have 4*80 machine, and change the “tensor_parallel_size = 4”, but the RuntimeError will happen, any idea to resolve the problem, should I change any other parameters? Thank you!
>
python3 -m we…
-
I'm keen on adding [speculative decoding](https://arxiv.org/abs/2211.17192) to outlines.
Is this something that is being worked on? Otherwise I would be happy to submit a PR but I'd need some advic…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
run LLaVA-NeXT error:
python -m vllm.entrypoints.openai.api_server --model /ai/LLaVA-NeX…
-
Great work!
I tried your [example](https://github.com/SafeAILab/EAGLE#:~:text=llama%2D2%2Dchat%5D-,With%20Code,-You%20can%20use) for llama-7b-chat and changed the tree structure in choices.py into …
-
First of all, thank you for the great work!
Is there any plan to support paged kv cache in non-contiguous memory? For instance, in flash_attn_with_kvcache?