The vllm-rocm is dependent on flash_attention, and it also relies on PyTorch on ROCm 5.7, while flash_attention depends on PyTorch ROCm 5.4? How should I proceed to ensure vllm runs smoothly? The AMD ROCm support in flash_attention isn't very clear, it only mention how flash_attention can be run in the docker. Could you provide a tutorial for installing a version of PyTorch that is compatible with both vllm and flash_attention? Since I have encountered so many problems due to the conflicts version of PyTorch on ROCm.
The vllm-rocm is dependent on flash_attention, and it also relies on PyTorch on ROCm 5.7, while flash_attention depends on PyTorch ROCm 5.4? How should I proceed to ensure vllm runs smoothly? The AMD ROCm support in flash_attention isn't very clear, it only mention how flash_attention can be run in the docker. Could you provide a tutorial for installing a version of PyTorch that is compatible with both vllm and flash_attention? Since I have encountered so many problems due to the conflicts version of PyTorch on ROCm.