[Bug] 请问下lmdeploy具体支持哪些（类型）的显卡，哪些是明确不支持的呢

jxfruit commented 2 months ago

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.

Describe the bug

我看到https://github.com/InternLM/lmdeploy/issues/1781 这个上面说GeForce GTX 1060是不知道，我本机是GeForce GTX 1660是可以支持的

Reproduction

在TITAN XP上就会有各种报错，使用turbomind作为backend，比如说 1）启动就报错：safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer 2）推理的时候报错：RuntimeError: CUDA error: unrecognized error code CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 3）terminate called after throwing an instance of 'terminate called recursively std::runtime_error' terminate called after throwing an instance of 'std::runtime_error' what(): [TM][ERROR] CUDA runtime error: no kernel image is available for execution on the device /lmdeploy/src/turbomind/kernels/sampling_topp_kernels.cu:1219

what(): [TM][ERROR] CUDA runtime error: no kernel image is available for execution on the device /lmdeploy/src/turbomind/kernels/sampling_topp_kernels.cu:1219

Aborted (core dumped)

尝试了internlm2-chat的1.8b-20b的模型，包括离线先用auto_awq 4bit量化，或者启动的时候添加--quant-policy 4的参数都最终失败

Environment

lmdeploy：0.5.0
pytorch：2.1
python：3.11

再补充问下，GTX30系列是支持的吗建议可否在官方文档中补充一下说明呢，感谢了

Error traceback

No response

zhyncs commented 2 months ago

CUDA Compute Capability: sm75, sm80, sm86, sm89, sm90

zhyncs commented 2 months ago

在TITAN XP上就会有各种报错

It's Pascal Architecture and its compute capability is sm60.

jxfruit commented 2 months ago

我又去看了下官网上的说明，只是在量化时提到对cuda算力的支持情况，那没有使用量化的模型也是这些支持这些cuda算力的显卡才能支持吗

zhyncs commented 2 months ago

我又去看了下官网上的说明，只是在量化时提到对cuda算力的支持情况，那没有使用量化的模型也是这些支持这些cuda算力的显卡才能支持吗

Yes

lvhan028 commented 2 months ago

我又去看了下官网上的说明，只是在量化时提到对cuda算力的支持情况，那没有使用量化的模型也是这些支持这些cuda算力的显卡才能支持吗

非量化模型推理（bf16，fp16）支持 cuda sm70 及以上架构的显卡，包括，100 Volta架构显卡, T4//20系列等Turing架构显卡，A10/A100/30系列等Ampere架构显卡，40系列等AdaLovelace架构显卡

4bit 权重量化支持的cuda架构如文档所述

4/8bit KV cache 量化支持的cuda架构和非量化模型推理支持的架构一致。

zhyncs commented 2 months ago

Yes you are right. V100 is supported, it is sm70.

zhyncs commented 2 months ago

ref https://developer.nvidia.com/cuda-gpus#compute

suchang1992 commented 2 months ago

sm60的有方法可以支持吗？或者有其他框架支持吗？

Weiqiang-Li commented 2 months ago

同样有sm60的支持需求，请问有其他方式启动类 openai 服务吗？试过使用 vllm，也失败了，也是只从sm70开始支持，试了使用 transformer ，在1080上是可以运行的，难不成要手写openai的接口 😂 😂 😂

suchang1992 commented 2 months ago

同样有sm60的支持需求，请问有其他方式启动类 openai 服务吗？试过使用 vllm，也失败了，也是只从sm70开始支持，试了使用 transformer ，在1080上是可以运行的，难不成要手写openai的接口 😂 😂 😂

ollama is ok

InternLM / lmdeploy