-
### Your current environment
Using vllm/vllm-openai:v0.5.3.post1 Docker image. Executed within the container:
```
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug bui…
-
### What happened?
After entering a command, the response gets stuck and starts continuously cycling through a certain part of the response
### Version check
- [X] Yes I was.
### Relevant log outp…
-
Hello Dusty
Here's the thing. We recently keep working on Jetson Orin projects. When I decided to copy my environment to another Orin. I just wonder can we use local model to load such as VILA or L…
-
### BUG DESCRIPTION
Running on google colab a script to finetune LLAMA 3 8B with flash attention.
This issue is not directly related to transformers but to an extension library: flash attention
…
-
A couple issues with the new tensor parallelism implementation!
1) Tensor Parallelism doesn't appear to respect a lack of flash attention, even via the -nfa flag. It also doesn't document flash att…
-
Llama 3.1 70B Instruct (baseline) GPQA: 46.7 (Meta) vs 41.07 (reported by Sahil) Math: 68.0 (Meta) vs 55.70 (reported by Sahil)
It makes their model's scores on these tests look over-inflated
-
**Describe the bug**
Adding `"zero_quantized_weights": true,` leads to a crash:
```
35:1]: warnings.warn(
[35:1]:Traceback (most recent call last):
[35:1]: File "/data/env/lib/repos/retro-l…
-
My ComfyUI is not the portable. I installed the Searge LLM with the ComfyUI manager, then I installed it manually, in both cases; I got the Traceback below error. I tried to Install the following comm…
-
### What happened?
This started as a problem with Ooba, but I'm seeing the same issue with KoboldCPP and llama.cpp. I updated Ooba the other day, after maybe a week or two of not doing so. While it …
-
I'm trying to set up the project locally, but I'm encountering an error while running bolna server which is preventing me from proceedings. I have already install "llama_index" framework but it's stil…