Open tensorfly-gpu opened 2 years ago
Thanks for the suggestion! We are evaluating this feature!
感谢老师的回信。我对这个接口抱有很大的期望,已经利用该接口创建了SwinResnet并对其进行了验证。项目地址SwinT-让Swin-Transformer的使用变得和CNN一样方便快捷! - 飞桨AI Studio - 人工智能学习实训社区 (baidu.com) 之前我觉得我很难使用paddle编写Swin-Transformer的代码,但是看了您的课程之后,虽然从头编写我觉得我还是做不到,但是至少可以找出程序出错的地方并进行修改了,非常感谢!
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年12月22日(星期三) 下午4:31 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [BR-IDL/PaddleViT] I want paddle can create an api nn.SwinT,inputs and outputs all equal nn.Conv2D (Issue #130)
Thanks for the suggestion! We are evaluating this feature!
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
我想paddle可以增加一个为nn.SwinT的接口,输入和输出完全同卷积。 它可以完全替换掉任意基于卷积模型中的二维卷积层,因为输入和输出形状完全同卷积,因此十分方便。 在卷积和注意力混用的模型中会更加方便(尤其目前很多这方面的研究),非常希望飞桨能将此接口加入到nn.SwinT中,并进行优化。 我观察到虽然Vit在CV领域日渐成熟,但是大家对于他的使用还是比较陌生,更熟悉的还是CNN,所以将SwinT做成和CNN具有相同输入输出的接口将极大方便日常的模型编写和训练。
另一个方面是,我们可以在模型重要的几层用上SwinT,而其他层用卷积,兼顾效率和精度,但是现在没有这样方便的接口供大家方便的去这样的编程。包括Vit用于分类的模型,一般迁移到输出为图片任务的时候,大家很容易迷失,而CNN用于分割检测非常成熟,所以,这也是另一个比较重要的原因。
我对paddleVit中的SwinV1修改了注意力——>改成余弦注意力,修改了前归一化——>后归一化,实现了SwinV2的两个内容,目前对于相对位置偏置没有做修改,因为主要是当作和CNN一样方便的层来使用。
代码如下,我对这个接口还是抱有很大期望的,由于本人代码水平有限,如果paddle官方开发也对这个接口感兴趣的话,可以优化以下代码。感激不尽!
import numpy as np import paddle import paddle.nn as nn
class DropPath(nn.Layer): """ DropPath class 原理 :字如其名,Drop Path就是随机将深度学习网络中的多分支结构随机删除。 功能 :一般可以作为正则化手段加入网络,但是会增加网络训练的难度。尤其是在NAS问题中,如果设置的drop prob过高,模型甚至有可能不收敛。 """ def init(self, drop_prob=None): super().init() self.drop_prob = drop_prob
class Identity(nn.Layer): """ Identity layer The output of this layer is the input without any change. Use this layer to avoid if condition in some forward methods """ def init(self): super(Identity, self).init() def forward(self, x): return x
class PatchEmbedding(nn.Layer): """Patch Embeddings Apply patch embeddings on input images. Embeddings is implemented using a Conv2D op. Attributes: image_size: int, input image size, default: 224 patch_size: int, size of patch, default: 4 in_channels: int, input image channels, default: 3 embed_dim: int, embedding dimension, default: 96 """
class PatchMerging(nn.Layer): """ Patch Merging class Merge multiple patch into one path and keep the out dim. Spefically, merge adjacent 2x2 patches(dim=C) into 1 patch. The concat dim 4C is rescaled to 2C Attributes: input_resolution: tuple of ints, the size of input dim: dimension of single patch reduction: nn.Linear which maps 4C to 2C dim norm: nn.LayerNorm, applied after linear layer. """
class Mlp(nn.Layer): """ MLP module Impl using nn.Linear and activation is GELU, dropout is applied. Ops: fc -> act -> dropout -> fc -> dropout Attributes: fc1: nn.Linear fc2: nn.Linear act: GELU dropout1: dropout after fc1 dropout2: dropout after fc2 """
def windows_partition(x, window_size): """ partite windows into window_size x window_size Args: x: Tensor, shape=[b, h, w, c] window_size: int, window size Returns: x: Tensor, shape=[num_windows*b, window_size, window_size, c] """
def windows_reverse(windows, window_size, H, W): """ Window reverse Args: windows: (n_windows * B, window_size, window_size, C) window_size: (int) window size H: (int) height of image W: (int) width of image Returns: x: (B, H, W, C) """
class WindowAttention(nn.Layer): """Window based multihead attention, with relative position bias. Both shifted window and non-shifted window are supported. Attributes: dim: int, input dimension (channels) window_size: int, height and width of the window num_heads: int, number of attention heads qkv_bias: bool, if True, enable learnable bias to q,k,v, default: True qk_scale: float, override default qk scale head_dim**-0.5 if set, default: None attention_dropout: float, dropout of attention dropout: float, dropout for output """
class SwinTransformerBlock(nn.Layer): """Swin transformer block Contains window multi head self attention, droppath, mlp, norm and residual. Attributes: dim: int, input dimension (channels) input_resolution: int, input resoultion -->input_resolution: tuple, input resoultion
class SwinT(nn.Layer):
测试代码如下
tmp = paddle.to_tensor(np.random.rand(2, 48, 224, 224), dtype='float32') print(tmp.shape) sts = SwinT(in_channels=48, input_resolution=(224,224), num_heads=8, window_size=8, qkv_bias=False, qk_scale=None, dropout=0.1, attention_dropout=0.1, droppath=0.1,downsample=True) out = sts(tmp) print(out.shape)
输出结果如下
[2, 48, 224, 224] [2, 96, 112, 112]