-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I'm implementating a custom algorithm that requires a custom generate met…
-
Last Error Received:
Process: Ensemble Mode
If this error persists, please contact the developers with the error details.
Raw Error Details:
RuntimeError: "Invalid buffer size: 35.38 GB"
…
-
Passing the --use-flash-attn flag is intended to enable flash attention; however, when the --use-mcore-models flag (to use the transformer engine) is also specified, flash attention will not be applie…
-
Thank you for taking the time to review my question.
Before I proceed, I would like to mention that I am a beginner, and I would appreciate your consideration of this fact.
I am seeking assistan…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
### The Feature
To support custom input params for Triton embedding server.
### Motivation, pitch
Currently the input payload params of the Triton Embedding model call is fixed with below for…
-
I tried to load Lora training adapters from Deepspeed checkpoint:
dir:
```
ls Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
total 696M
-rw-r--r-- 1 schwan46494@gmail.c…
-
## 🐛 Bug
We need to determine whether Thunder has real accuracy problems computing HF's Qwen 2 model.
The test added in https://github.com/Lightning-AI/lightning-thunder/pull/1406 might fail bec…
-
Very similar to the issues here ([#1553](https://github.com/huggingface/tokenizers/issues/1553), [#1517](https://github.com/huggingface/tokenizers/issues/1517)), but for the newest Llama models the of…
-
qwen2-vl has always been memory hungry (compared to the other vision models) and even with unsloth it still OOMs when the largest llama3.2 11b works fine.
I'm using a dataset that has high resolution…