ColfaxResearch / cutlass-kernels

MIT License
138 stars 25 forks source link

Allow tile sizes to be multiples of 16 #6

Closed jayhshah closed 5 months ago

jayhshah commented 5 months ago

This pull request fixes a problem with tile sizes below 64 when they otherwise should be allowed. In practice though, it's not expected that these small tile sizes will lead to improved performance except in some edge cases with small sequence length.