-
### System Info
```shell
accelerate 1.1.1
neuronx-cc 2.14.227.0+2d4f85be
neuronx-distributed 0.8.0
neuronx-distributed-training 1.0.0
optimum …
-
### Your current environment
vllm 0.5.2
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to b…
-
### Prerequisites
- [X] I have read the [ServerlessLLM documentation](https://serverlessllm.github.io/).
- [X] I have searched the [Issue Tracker](https://github.com/ServerlessLLM/ServerlessLLM/issue…
-
### UPDATE(11/23/2024)
Currently, @james-p-xu is removing rope, @yizhang2077 is removing distributed, @HandH1998 is removing weight loader. Optimistically, we can remove these dependencies by the…
-
我的代码
from vllm import LLM, SamplingParams
from chatharuhi import ChatHaruhi (这里只要导入ChatHaruhi就会报Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'sp…
-
### Your current environment
Running via Docker
```text
docker run --runtime nvidia --gpus \"device=${CUDA_VISIBLE_DEVICES}\" --shm-size 8g -v $volume:/root/.cache/huggingface …
-
### Your current environment
```tex
The environment is the latest vllm-0.5.4's docker environment, and the command to run is:python3 api_server.py --port 10195 --model /data/models/Mistral-Large-Ins…
-
```
2024-11-09 21:39:44.994636: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already b…
-
### Your current environment
AMD radon + kubernetes
### Model Input Dumps
`vllm serve mistralai/Mistral-7B-Instruct-v0.3 --trust-remote-code --enable-chunked-prefill --max_num_batch…
-
### The rule detects the following modeling patterns
* Detect when the number of incoming flows of a parallel gateway does not match the number of outgoing flows of the closest parallel gateway…