It would be great if OpenLLM supported pre-Ampere architecture Cuda devices. In my case, I'm looking at the volta architecture.
The README currently indicates that an Ampere-architecture or newer GPU is required to use the vLLM backend, otherwise you're stuck with the torch backend.
As far as I can tell, this is not a vLLM-specific constraint - vLLM does not require that you use an ampere-arch device.
Motivation
I am trying to run OpenLLM on my Nvidia Tesla v100 (32GB) devices, but I cannot use the vLLM backend, as OpenLLM's vLLM backend does not support the volta architecture.
Other
I would love to help as best as I can, but I can't find any documentation for where this constraint comes from, other than in the README. I've gone through vLLM's docs, and they do not indicate that this is a vLLM constraint.
Feature request
It would be great if OpenLLM supported pre-Ampere architecture Cuda devices. In my case, I'm looking at the volta architecture.
The README currently indicates that an Ampere-architecture or newer GPU is required to use the vLLM backend, otherwise you're stuck with the torch backend.
As far as I can tell, this is not a vLLM-specific constraint - vLLM does not require that you use an ampere-arch device.
Motivation
I am trying to run OpenLLM on my Nvidia Tesla v100 (32GB) devices, but I cannot use the vLLM backend, as OpenLLM's vLLM backend does not support the volta architecture.
Other
I would love to help as best as I can, but I can't find any documentation for where this constraint comes from, other than in the README. I've gone through vLLM's docs, and they do not indicate that this is a vLLM constraint.