-
I'm trying to follow the instructions for installing unsloth in a conda environment, the problem is that the conda gets stuck when running the install lines.
I've tried running it twice, both times…
-
### Your current environment
Not applicable -- Dockerfile.
### 🐛 Describe the bug
Steps to reproduce:
- Clone the `vllm` repo
- run `docker build . --target vllm-base`
- Build fails
```shel…
-
I have noticed that Alpaca uses my CPU instead of my GPU. Here's a screenshot showing how it's using almost 40% of my CPU, and only 1% of my GPU.
![Captura desde 2024-07-10 06-51-39](https://github…
-
I tried converting Google Gemma 2B models to TfLite. Found it ending in failure
### 1. System information
- Ubuntu 22.04
- TensorFlow installation (installed with keras-nlp) :
- TensorFlow l…
-
### Feature request / 功能建议
At present the vllm and sglang used in xinference are older. New sglang supports gemma-2 models out of the box, but in xinference engine sglang it does not support. Same is…
-
### Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related iss…
-
### What happened?
Llama 3.1 8B quantized after https://github.com/ggerganov/llama.cpp/pull/8676 fails the "wicks" problem that LLama 3 8B can answer correctly.
Prompt: `Making one candle requir…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
Recently, we have seen reports of `AsyncEngineDeadError`, including:
- [ ] #5060
…
-
### Which domain(s) should be blocked?
auditrecording-pa.googleapis.com
clienttracing-pa.googleapis.com
datasaver.googleapis.com
feedback-pa.googleapis.com
growth-pa.googleapis.com
gvt2-cn.com…
-
Tried to make an EXL2 of it.
I added the fix to the `inv_freq` scaling that is apparently expected in these models, making the following change in `model.py` (see https://huggingface.co/v2ray/Llama…