Closed flxst closed 1 month ago
This PR implements manual attention and pytorch flash attention, in addition to the previously implemented dao flash attention. Group Query Attention is supported.
manual attention
pytorch flash attention
dao flash attention
This PR implements
manual attention
andpytorch flash attention
, in addition to the previously implementeddao flash attention
. Group Query Attention is supported.