MultiheadAttention 模块资料收集

小结

AttributeError: module 'oneflow.nn' has no attribute 'MultiheadAttention'

目前 MultiheadAttention 模块开发需要的模块有点多，准备python端先绕过。

MultiheadAttention 介绍

MultiheadAttention is a PyTorch module that implements the multi-head attention mechanism used in transformer architectures¹. It takes in inputs of shape (batch_size, seq_len, hidden_dim) and returns an output tensor of shape (batch_size, seq_len, hidden_dim).

The multi-head attention mechanism is used to compute attention scores between different parts of the input sequence. It does this by computing multiple attention scores in parallel, each with its own set of parameters¹.

Let me know if you have any other questions!

源: 与必应的对话， 2023/4/4(1) MultiHeadAttention实现详解 - 知乎. https://zhuanlan.zhihu.com/p/358206572 访问时间 2023/4/4. (2) MultiHeadAttention实现详解 | Finisky Garden. https://finisky.github.io/2020/05/25/multiheadattention/ 访问时间 2023/4/4. (3) マルチヘッドアテンション (Multi-head Attention) [Transformerの部品]. https://cvml-expertguide.net/terms/dl/seq2seq-translation/transformer/multi-head-attention/ 访问时间 2023/4/4. (4) MultiheadAttention — PyTorch 2.0 documentation. https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html 访问时间 2023/4/4. (5) tf.keras.layers.MultiHeadAttention | TensorFlow v2.12.0. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention 访问时间 2023/4/4. (6) MultiHeadAttention layer - Keras. https://keras.io/api/layers/attention_layers/multi_head_attention/ 访问时间 2023/4/4.

Pytorch

[ ] _native_multi_head_attention op （MultiheadAttention 模块中函数op ）
[ ] scaled_dot_product_attention
[ ] _scaled_dot_product_attention
[ ] _scaled_dot_product_attention_math
[ ] _scaled_dot_product_flash_attention
[ ] _scaled_dot_product_flash_attention_backward
[ ] _scaled_dot_product_efficient_attention
[ ] _scaled_dot_product_efficient_attention_backward
[ ] _flash_attention_forward
[ ] _flash_attention_backward
[ ] _efficient_attention_forward
[ ] _efficient_attention_backward
[ ] multi_head_attention_forward
[ ] ....

_native_multi_head_attention

声明 ```shell # aten/src/ATen/native/native_functions.yaml - func: _native_multi_head_attention(Tensor query, Tensor key, Tensor value, int embed_dim, int num_head, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None, bool need_weights=True, bool average_attn_weights=True, int? mask_type=None) -> (Tensor, Tensor) variants: function dispatch: CPU, NestedTensorCPU: native_multi_head_attention_cpu CUDA, NestedTensorCUDA: native_multi_head_attention_cuda autogen: _native_multi_head_attention.out ``` - cpu 编码: `aten/src/ATen/native/transformers/attention.cpp` - cuda 编码: `aten/src/ATen/native/transformers/cuda/attention.cu`

Renferce

torch.nn.MultiheadAttention: link
has_torch_function： link
handle_torch_function: link
torch._C._nn.scaled_dot_product_attention link

Oneflow-Inc / OneFlow-Pruning