-
Hi akash-aky, First of all, thank you for creating `Exile`, it's a very amazing library! I recently ran into some problems using it to execute `parallel` this application, here are my debug result:
`…
-
### Your current environment
vllm 0.5.2
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to b…
piood updated
2 weeks ago
-
**Describe the bug**
Resource estimation for vLLM backend is incorrect and ignores quantization.
**Steps to reproduce**
1. In a GPU server with 4 L20 (48G VRAM) cards without any model deploy…
-
### System Info
```shell
accelerate 1.1.1
neuronx-cc 2.14.227.0+2d4f85be
neuronx-distributed 0.8.0
neuronx-distributed-training 1.0.0
optimum …
-
### Prerequisites
- [X] I have read the [ServerlessLLM documentation](https://serverlessllm.github.io/).
- [X] I have searched the [Issue Tracker](https://github.com/ServerlessLLM/ServerlessLLM/issue…
-
### UPDATE(11/23/2024)
Currently, @james-p-xu is removing rope, @yizhang2077 is removing distributed, @HandH1998 is removing weight loader. Optimistically, we can remove these dependencies by the…
-
### Your current environment
Running via Docker
```text
docker run --runtime nvidia --gpus \"device=${CUDA_VISIBLE_DEVICES}\" --shm-size 8g -v $volume:/root/.cache/huggingface …
-
### Your current environment
```tex
The environment is the latest vllm-0.5.4's docker environment, and the command to run is:python3 api_server.py --port 10195 --model /data/models/Mistral-Large-Ins…
-
我的代码
from vllm import LLM, SamplingParams
from chatharuhi import ChatHaruhi (这里只要导入ChatHaruhi就会报Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'sp…
-
```
2024-11-09 21:39:44.994636: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already b…