Oneflow-Inc / OneFlow-Pruning

[CVPR-2023] Towards Any Structural Pruning
https://arxiv.org/abs/2301.12900
MIT License
16 stars 1 forks source link

MultiheadAttention 模块资料收集 #2

Closed ccssu closed 1 year ago

ccssu commented 1 year ago

小结

AttributeError: module 'oneflow.nn' has no attribute 'MultiheadAttention'

目前 MultiheadAttention 模块开发需要的模块有点多,准备python端先绕过。

MultiheadAttention 介绍

MultiheadAttention is a PyTorch module that implements the multi-head attention mechanism used in transformer architectures¹. It takes in inputs of shape (batch_size, seq_len, hidden_dim) and returns an output tensor of shape (batch_size, seq_len, hidden_dim).

The multi-head attention mechanism is used to compute attention scores between different parts of the input sequence. It does this by computing multiple attention scores in parallel, each with its own set of parameters¹.

Let me know if you have any other questions!

源: 与必应的对话, 2023/4/4(1) MultiHeadAttention实现详解 - 知乎. https://zhuanlan.zhihu.com/p/358206572 访问时间 2023/4/4. (2) MultiHeadAttention实现详解 | Finisky Garden. https://finisky.github.io/2020/05/25/multiheadattention/ 访问时间 2023/4/4. (3) マルチヘッドアテンション (Multi-head Attention) [Transformerの部品]. https://cvml-expertguide.net/terms/dl/seq2seq-translation/transformer/multi-head-attention/ 访问时间 2023/4/4. (4) MultiheadAttention — PyTorch 2.0 documentation. https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html 访问时间 2023/4/4. (5) tf.keras.layers.MultiHeadAttention | TensorFlow v2.12.0. https://www.tensorflow.org/api_docs/python/tf/keras/layers/MultiHeadAttention 访问时间 2023/4/4. (6) MultiHeadAttention layer - Keras. https://keras.io/api/layers/attention_layers/multi_head_attention/ 访问时间 2023/4/4.

Pytorch

_native_multi_head_attention 声明 ```shell # aten/src/ATen/native/native_functions.yaml - func: _native_multi_head_attention(Tensor query, Tensor key, Tensor value, int embed_dim, int num_head, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None, bool need_weights=True, bool average_attn_weights=True, int? mask_type=None) -> (Tensor, Tensor) variants: function dispatch: CPU, NestedTensorCPU: native_multi_head_attention_cpu CUDA, NestedTensorCUDA: native_multi_head_attention_cuda autogen: _native_multi_head_attention.out ``` - cpu 编码: `aten/src/ATen/native/transformers/attention.cpp` - cuda 编码: `aten/src/ATen/native/transformers/cuda/attention.cu`

Renferce