Open Hugh-reflexion opened 9 months ago
Hi Ajay, In "Multi Head Attention " section, after scaled_dot_product, it needed to reshape, but I think before the reshape, it need to permute the head dimension and the sequence dimension. Please let me know if I'm wrong.
Hello, I have the same thought. Do you get any response? I'm pretty confused now.
Hi Ajay, In "Multi Head Attention " section, after scaled_dot_product, it needed to reshape, but I think before the reshape, it need to permute the head dimension and the sequence dimension. Please let me know if I'm wrong.