issues
search
ROCm
/
flash-attention
Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
142
stars
46
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Issue]: Expected dout_seq_stride == out_seq_stride to be true, but got false
#54
ehartford
closed
6 months ago
2
[Feature]: Support for newer flash-attention versions (e.g. ≥2.1.0)
#53
JiahuaZhao
opened
6 months ago
2
GPUAI-1250 - Flash Attention v2.04 two modules layer_norm cannot be used fixed
#52
xiaoxiangAMD
opened
7 months ago
0
[Issue]: RuntimeError: FlashAttention forward only supports head dimension at most 128
#51
xxtars
closed
6 months ago
2
[Issue]: Error in the implementation ?
#50
PierreColombo
opened
8 months ago
2
add benchmark script
#49
fsx950223
closed
8 months ago
2
add FA api benchmark csv
#48
fsx950223
opened
8 months ago
1
GPUAI-1250 - Flash Attention v2.04 module rotary cannot be used code fixed
#47
xiaoxiangAMD
opened
9 months ago
2
aac.amd: MI210 - roberta-large with sequence length 8192 and batch_size 1 fails
#46
michaelfeil
closed
1 week ago
2
[Feature]: Is there a Flash-Decoding algorithm implemented based on Composable kernel?
#45
zhangxiao-stack
opened
9 months ago
3
[Issue]: Backward performance
#44
netw0rkf10w
opened
9 months ago
1
[Issue]: Unstable training
#43
netw0rkf10w
opened
9 months ago
1
[Issue]: Installation failed through Dockerfile
#42
amdrenwuli
opened
9 months ago
4
[Issue]: RuntimeError: Expected dout_seq_stride == out_seq_stride to be true, but got false.
#41
donglixp
opened
9 months ago
15
[Issue]: Expected dout_seq_stride == out_seq_stride to be true, but got false
#40
ehartford
opened
10 months ago
14
Installation error
#39
ekazakos
closed
1 week ago
3
Allow gfx908 to build
#38
luizanao
closed
9 months ago
0
Support for MI100 gfx908
#37
luizanao
closed
9 months ago
0
Another installation error
#36
ekazakos
closed
10 months ago
1
Merge to upstream flash-attention repo
#35
ehartford
opened
10 months ago
13
Support for other modules (rotary, xentropy, layer_norm)
#34
bbartoldson
opened
10 months ago
4
replace kernel implementation using CK tile-programming performant kernels
#33
carlushuang
opened
10 months ago
1
Not working on MI250
#32
PierreColombo
closed
10 months ago
0
undefined symbol: hipGetDevicePropertiesR0600
#31
alain40
opened
11 months ago
3
can mask be supported?
#30
unwritten
opened
11 months ago
0
Mi50 Support
#29
YehowshuaScaled
opened
11 months ago
5
installation error
#28
donglixp
opened
11 months ago
14
RDNA3 support
#27
WilliamGazeley
opened
11 months ago
76
Is this v2 or v1?
#26
netw0rkf10w
closed
11 months ago
4
installation error of Method 1 with the recommended docker
#25
donglixp
closed
11 months ago
4
MI100 Support
#24
LoggerHead22
opened
12 months ago
20
Make installation steps look better
#23
Naomiusearch
closed
11 months ago
0
Feature request: Sliding Window Attention
#22
tjtanaa
opened
1 year ago
6
Support mfma_f32_16x16x16f16
#21
hclearner
closed
11 months ago
6
Install failed
#20
1787648106
opened
1 year ago
12
Remove Hardcoded Building Options
#19
dejay-vu
closed
1 year ago
2
Remove offload-arch=native in the build
#18
fxmarty
closed
12 months ago
6
Ifu mqa
#17
guangzlu
closed
1 year ago
0
Add MQA & GQA
#16
guangzlu
closed
1 year ago
1
bwd optimizing based on profiling
#15
guangzlu
closed
1 year ago
2
IFU to v2.0.4
#14
dejay-vu
closed
1 year ago
3
compatiable with xformers
#13
fsx950223
closed
1 year ago
5
Optimized API for packed conditions
#12
guangzlu
closed
1 year ago
0
add rocm benchmark script
#11
fsx950223
closed
8 months ago
1
Optimization based on profiling for forward
#10
guangzlu
closed
1 year ago
0
add batch api
#9
fsx950223
closed
1 year ago
0
Remove patch
#8
groenenboomj
closed
1 year ago
3
Increasing the compiling time by spliting into several cpp files
#7
dejay-vu
closed
1 year ago
1
Enable both Qloop and Kloop
#6
guangzlu
closed
1 year ago
1
Enable both Qloop and Kloop
#5
guangzlu
closed
1 year ago
3
Previous
Next