Closed VikParuchuri closed 4 weeks ago
Torch 2.5 is a good bit slower than 2.4.1 - I think something in the sdpa attention implementation changed.
Torch 2.5 is a good bit slower than 2.4.1 - I think something in the sdpa attention implementation changed.