ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
109 stars 33 forks source link

[Issue]: Error in the implementation ? #50

Open PierreColombo opened 3 months ago

PierreColombo commented 3 months ago

Problem Description

Hello,

Model https://huggingface.co/databricks/dbrx-instruct is not working with Flash attn on ROCM, working on NVIDIA 100

The current model is not working with AMD MI250 with flash attention:

Concretly take a node of MI250 : load with attn_implementation="flash_attention_2"

See: https://huggingface.co/databricks/dbrx-instruct

Operating System

ADASTRA

CPU

ADASTRA

GPU

AMD Instinct MI250X, AMD Instinct MI250

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

https://huggingface.co/databricks/dbrx-instruct/discussions/13

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response