-
On my RX 6800 I seem to get `RuntimeError: FlashAttention only supports AMD MI200 GPUs or newer.` for some reason, I Googled that GPU and it seems to be RDNA2 like mine but for enterprise. Is this not…
-
### Your current environment
environment
pip install auto_gptq modelscope xformers torchvision torchaudio torch==2.1.2 -U
pip install datasets huggingface-hub transformers==4.39.1 -U
pip install…
uRENu updated
3 months ago
-
### Description
本地部署的agent,调用qwen-max模型。然后每建立一个对话都需要消耗相应的显存吗?多几个对话,显存满了就只能等待?
### Link
_No response_
-
InstructLab 0.13 supports hardware acceleration for Apple Silicon (via `mlx`) and CUDA-like GPUs (NVIDIA CUDA and AMD ROCm via `torch.cuda`). I would like to add support for Intel Gaudi 2 hardware and…
tiran updated
2 months ago
-
### The Feature
we support this for completion, need to support for async completion to work for proxy
### Motivation, pitch
user faced issue trying to make calls to mixtral on vllm using us
###…
-
### Your current environment
```text
Collecting environment information...
/data/miniconda3_new/envs/vllm/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORM…
-
### System Info / 系統信息
ubuntu22
conda
python3.11
nvidia-cudnn-cu12
torch 2.3.0
vllm 0.5.0.post1
vllm-flash-attn 2.5.9…
-
### 🚀 The feature, motivation and pitch
I'm using a newer version of `outlines` than v0.0.34, and my application needs the fixes implemented in newer versions of that package. It would be great if …
-
Hello, Could you do something for the open-mixtral-8x7b model to fix the truncation bug on "la plateforme".
Here they explain what they did to solve it on a vllm server (with spacing between ` an…
-
### 🚀 The feature, motivation and pitch
Claim major improvements over vllm. Unfortunately no code only the paper.
arxiv.org/abs/2405.04437
### Alternatives
_No response_
### Additional context
…