-
**Describe the bug**
**Steps to reproduce**
1. Deploy a model 'OpenGVLab/InternVL2-4B' from Hugging Face and configure the backend. parameter `--trust-remote-code`.
2. View the instance log…
-
### Your current environment
```text
The output of `python collect_env.py`
```
root@9b33a89c3857:/workspace/vllm-0.4.2# python collect_env.py
Collecting environment information...
PyTorch versi…
-
Hi team,
Optimum Neuron is looking into adding speculative decoding support for some seq2seq models. There seems to be an example from the Annapurna team but the link to the resource is missing. C…
-
> Note: This is half bug (since it causes unnecessary errors in certain situations), and half feature request (since LMDeploy itself is not responsible for connection timeouts). I wasn't sure which to…
-
For example, 1.1B tinyllama.
-
### Your current environment
The output of `python env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1…
-
I am calling encode from whisperX/faster-whisper.
Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the…
-
**Intention**
Disassemble the AArch64 binary.
**Describe the bug**
When I use dyninst to disassemble aarch64 binary which is striped, it incurs an assertion: `instructionAPI/src/aarch64_opcode_ta…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Your current environment
```
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=1"' \
--shm-size=10.24gb \
-p 5001:500…