EmbeddedLLM / vllm-rocm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
https://vllm.readthedocs.io
Apache License 2.0
83 stars 5 forks source link

Compatible GPU architectures #14

Closed TheJKM closed 7 months ago

TheJKM commented 7 months ago

Hi, awesome work! I have a question about supported GPU architectures, and I couldn't find anything about it in the repo. All your tests seem to be done on the Mi210, which is a CDNA2 card. Does your vLLM ROCm port also work on different architectures, like RDNA 3 and RDNA 2, which are now supported by ROCm 5.7?

tanpinsiang commented 7 months ago

Thank you for your interest in our work. Due to the dependency on flash attention 2 the current version does not support RDNA3 and RDNA2 architectures.

However, we are developing a version that does not utilize flash attention 2. If this alternative version interests you, please reach out to me via email.

tanpinsiang commented 7 months ago

According to https://github.com/ROCmSoftwarePlatform/flash-attention/blob/flash_attention_for_rocm/setup.py#L215. The supported arch are CDNA2(MI200) and CDNA3(MI300).

TheJKM commented 7 months ago

Thank you for your quick answer! My primary interest is in benchmarking, so I'm happy to wait until the other version is available.