ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
142 stars 46 forks source link

[Issue]: Error in the implementation ? #50

Open PierreColombo opened 8 months ago

PierreColombo commented 8 months ago

Problem Description

Hello,

Model https://huggingface.co/databricks/dbrx-instruct is not working with Flash attn on ROCM, working on NVIDIA 100

The current model is not working with AMD MI250 with flash attention:

Concretly take a node of MI250 : load with attn_implementation="flash_attention_2"

See: https://huggingface.co/databricks/dbrx-instruct

Operating System

ADASTRA

CPU

ADASTRA

GPU

AMD Instinct MI250X, AMD Instinct MI250

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

https://huggingface.co/databricks/dbrx-instruct/discussions/13

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

harkgill-amd commented 1 month ago

Hi @PierreColombo, an internal ticket has been created to further investigate this issue.

schung-amd commented 3 weeks ago

Hi @PierreColombo, are you still experiencing this issue? If so, is this only occurring for dbrx-instruct, or do you see this with smaller models as well?