-
I am getting "float division by zero" error whenever I try to quantize mixtral related models with autogptq,
and here is my code.
```
from transformers import AutoTokenizer, TextGenerationPipeli…
-
Hi,
I was able to run _TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ_ model on 2 A10 gpus on AWS Sagemaker. I was using _ml.g5.12xlarge_ instance type.
Command to run the code
`python3 -m vllm.ent…
-
As we have a few models with Half-Quadratic Quantization (HQQ) out there, VLLM should also support them:
```sh
api_server.py: error: argument --quantization/-q: invalid choice: 'hqq' (choose from …
-
Looking forward to supporting Mixtral_8x7b MoE
-
It display error message:
"cupy_backends.cuda.libs.nccl.NcclError: NCCL_ERROR_INVALID_USAGE: invalid usage"
This error happens for vllm==0.3.2 while vllm==0.2.7 works oky.
To reproduce it:
…
-
**LocalAI version:**
2.5.1-cublas-cuda12
**Environment, CPU architecture, OS, and Version:**
Ubuntu 22.04 with 2 RTX A5000 24Gb GPUs
**Describe my bug**
My problem is that this model mixt…
-
{"message": "Error in _stream_synthesis_task\nTraceback (most recent call last):\n File \"/root/pythonenv/enve/lib/python3.10/site-packages/livekit/agents/utils/log.py\", line 16, in async_fn_logs\n …
-
Hello,
I'm looking to reproduce some of the open-source model results from the VWA paper:
(1) Mixtral-8x7B model as the LLM backbone for Caption-augmented model
(2) CogVLM for the Multimodal Mode…
-
While loading mixtral I get
"AssertionError: Insufficient space in device allocation".
Command I used
"python ericLLM.py --model ./models/mistralai_Mixtral-8x7B-Instruct-v0.1 --gpu_split 24,24,24,24,…
-
Hello,
I wonder why my Doc().query request often achieve random and often poor quality answers i term of relevance.
Papers are sometimes relevant sometimes not... Citations are, most of time, co…