Closed mayank31398 closed 1 year ago
Since FlashAttention only works with no mask or causal mask, its better to throw an error here.
You also mentioned --reset-position-ids being a problem, does this also need to be handled?
--reset-position-ids
I don't think that needs to be handled. It should work with any position ids.
Since FlashAttention only works with no mask or causal mask, its better to throw an error here.