1) Added support for causal masking for MHA, MQA and GQA fwd kernel
2) Causal masking now works with dissimilar sequence lengths
3) Removed vector bias - this is unrelated, but is not needed and I did not want to keep maintaining unnecessary code
4) Couple other unrelated bugfixes found during code review.
1) Added support for causal masking for MHA, MQA and GQA fwd kernel 2) Causal masking now works with dissimilar sequence lengths 3) Removed vector bias - this is unrelated, but is not needed and I did not want to keep maintaining unnecessary code 4) Couple other unrelated bugfixes found during code review.