amirzandieh / HyperAttention

Triton Implementation of HyperAttention Algorithm
Apache License 2.0
46 stars 1 forks source link

Unexpected longer inference time when input size is smaller #4

Open complexfilter opened 4 months ago

complexfilter commented 4 months ago

Hi,

I benchmarked for batch_size, head_size, dim, seq_len = (4, 8, 64, 16384),  the runtime is 3.24ms. However, for batch_size, head_size, dim, seq_len = (4, 8, 128, 16384), the runtime is only 2.03ms. 

This suggests that the code may have more room to optimize.