TransformerBlock now uses Flash Attention when use_rel_pos isn't needed.
Note: I only tested this briefly with an untrained model for obvious crashes, and will continue with actual testing once I finish building a new training loop for my models. It works, but the results might not be correct.
TransformerBlock now uses Flash Attention when
use_rel_pos
isn't needed.Note: I only tested this briefly with an untrained model for obvious crashes, and will continue with actual testing once I finish building a new training loop for my models. It works, but the results might not be correct.