[QST] flash_attn2: why tOrVt is no swizzle ?

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

BSD 3-Clause "New" or "Revised" License

13.95k stars 1.29k forks source link

Open itsliupeng opened 3 months ago

itsliupeng commented 3 months ago

In the code at this link, the line reads:

Tensor tOrVt = thr_mma.partition_fragment_B(sVtNoSwizzle);

Could you explain why sVtNoSwizzle is used here instead of simply using sVt? Thanks in advance for your clarification!

tridao commented 3 months ago

Idk tbh. Result was wrong without NoSwizzle.