-
### Your current environment
I'm trying to run inference in docker-compose, host: ubuntu 22.04
```
uname -r
5.19.0-1010-nvidia-lowlatency
```
```
version: '3.8'
services:
mixtral7x8b:…
-
I use python client
use the code from document , use stream and async
It raise : `httpx.ResponseNotRead: Attempted to access streaming response content, without having called `read()`.`
```…
-
Code to reproduce:
```python
import trl
from unsloth import FastLanguageModel
import torch
from tqdm import tqdm
from transformers import AutoTokenizer
from datasets import load_dataset
fr…
-
Environment:
Python version: 3.9
MistralAI package version: 0.0.8
Description:
Encountered an AttributeError when attempting to use the MistralClient.chat() method with ChatMessage obj…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I would like to use the [JSON mode](https://docs.mistral.ai/capabilities/json_m…
-
When trying to export large models, currently we are forced to export QDQ pattern for weights, instead of simply exporting Integer weights -> DQ.
The error seems to be caused by the fact that addin…
-
### My current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC vers…
-
### Bug Description
I'm using the Evaluator within a Flask app.
This works fine with all these LLMs:
- llama-index-llms-openai
- llama-index-llms-openailike
- llama-index-llms-together
But f…
-
also part of error: " File "/mmfs1/scratch/anamaria/privateGPT2/privateGPT/privategpt/components/llm/llm_component.py", line 37, in __init
logger.warning(
Message: 'Failed to download tokenizer…
-
TogetherAI just [announced ](https://www.together.ai/blog/function-calling-json-mode) JSON mode and function calling for their models. It currently supports these models: Mixtral, Mistra, and CodeLlam…