EmbeddedLLM / vllm-rocm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
https://vllm.readthedocs.io
Apache License 2.0
83 stars 5 forks source link

Conflicts version of PyTorch on ROCm #8

Closed JIANGJZ closed 7 months ago

JIANGJZ commented 7 months ago

The vllm-rocm is dependent on flash_attention, and it also relies on PyTorch on ROCm 5.7, while flash_attention depends on PyTorch ROCm 5.4? How should I proceed to ensure vllm runs smoothly? The AMD ROCm support in flash_attention isn't very clear, it only mention how flash_attention can be run in the docker. Could you provide a tutorial for installing a version of PyTorch that is compatible with both vllm and flash_attention? Since I have encountered so many problems due to the conflicts version of PyTorch on ROCm.