ajhalthor / Transformer-Neural-Network

Code Transformer neural network components piece by piece
MIT License
295 stars 157 forks source link

permute before reshape #8

Open Hugh-reflexion opened 9 months ago

Hugh-reflexion commented 9 months ago

Hi Ajay, In "Multi Head Attention " section, after scaled_dot_product, it needed to reshape, but I think before the reshape, it need to permute the head dimension and the sequence dimension. Please let me know if I'm wrong.

Infernus-WIND commented 4 months ago

Hello, I have the same thought. Do you get any response? I'm pretty confused now.