-
Hello. Thank for providing vLLM as a great open-source tool for inference and model serving! I was able to build vLLM on a cluster I maintain, but it only appears to work on a single MI210 GPU. Can so…
-
Trying to install it to NVidia's pytorch contaner. I'm getting this while running.
Same issue while trying to install it to Lambda GPU cloud on H100 instance. (all default)
```
root@0971a018b7ec…
-
### Your current environment
```
Collecting environment information...
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: …
-
### Your current environment
Docker latest 0.5.4
```
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=0"' \
--shm-size=10.…
-
### Your current environment
I used 0.4.3 version, pip install, cuda vsesion 12.0, A100 GPU
RuntimeError: t == DeviceType::CUDA INTERNAL ASSERT FAILED
### 🐛 Describe the bug
```
INFO 06-02 03…
-
```
D:\MiniCPM\venv\Lib\site-packages\torch\_tensor.py:962: UserWarning: The operator 'aten::pow.Scalar_out' is not currently supported on the ocl backend. Please open an issue at for requesting supp…
-
Following the conversation on slack regarding the device Perf CI responsibility, to be able to better distribute CI monitoring between various model owners, CI has to be split into multiple jobs.
Ini…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
I found the latest opensource LLM from google: Gemma has two version of model structure.
1. https://github.com/google/gemma_pytorch/blob/main/gemma/model_xla.py
2. https://github.com/google/gemma_…
-
Hi,
great work around onnxscript! I was wondering whether somewhere down the road collective operations / MPI primitives like `reduce_scatter`, `all_gather` will be added? If not, I'd be very curi…