LxMLS / lxmls-guide

Lisbon Machine Learning Summer School Lab Guide
81 stars 61 forks source link

[feat] Break down attention mechanism. Add figures #147

Closed tamohannes closed 1 year ago

tamohannes commented 1 year ago

Added 2 figures to illustrate K, Q, and V matrices formation, and showcase the attention operation. Gave explanation of the key, query, and value terms with the analogy of retrieval mechanism.

israfelsr commented 1 year ago

I would modify some of the names of the dimensions to make it more clear. Something like: B: batch size (I wouldn't add this) C: context sequence lenght E: embedding dimension H: hidden dimension Q: query sequence lenght I would also make the drawings about cross-attention using a different Q and C and then just mention that self-attention uses the same input for both, query and context.

Also there are two small errors in the equation, the operation inside the softmax is $Q \cdot K^T$. The dimension there should be BxQxC. Then is $sofmax(\cdot)\cdot V$, and you end up with BxQxH.