66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass
217 stars 18 forks source link

causal masking #2

Closed wisdom-miao closed 5 months ago

wisdom-miao commented 5 months ago

没有实现masking 的逻辑吗?

66RING commented 5 months ago

@wisdom-miao 是支持causal mask的,且写死了是causal模式,不支持手动传入attention mask。

causal模式支持体现在:计算时直接跳过上三角,不做计算

wisdom-miao commented 5 months ago

image 是这个地方吗

66RING commented 5 months ago

@wisdom-miao 是

cutlass版在这: image

wisdom-miao commented 5 months ago

@wisdom-miao 是

cutlass版在这: image

嗯嗯 感谢大佬。向大佬学习的一天

66RING commented 5 months ago

先关了 随时open