BR-IDL / PaddleViT

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
https://github.com/BR-IDL/PaddleViT
Apache License 2.0
1.22k stars 318 forks source link

关于 ViT Transformer Encoder 中存在硬编码的问题 #76

Closed libertatis closed 2 years ago

libertatis commented 2 years ago

PaddleViT/image_classification/ViT/transformer.pyL300 创建 EncoderLayer 时,参数 qkv_bias, mlp_ratio, dropout, attention_dropout存在硬编码的问题,导致 Encoder__init__ 方法传入的参数形同虚设:

class Encoder(nn.Layer):
    """Transformer encoder
    Encoder encoder contains a list of EncoderLayer, and a LayerNorm.
    Attributes:
        layers: nn.LayerList contains multiple EncoderLayers
        encoder_norm: nn.LayerNorm which is applied after last encoder layer
    """
    def __init__(self,
                 embed_dim,
                 num_heads,
                 depth,
                 qkv_bias=True,
                 mlp_ratio=4.0,
                 dropout=0.,
                 attention_dropout=0.,
                 droppath=0.):
        super(Encoder, self).__init__()
        # stochatic depth decay
        depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
        layer_list = []
        for i in range(depth):
            encoder_layer = EncoderLayer(embed_dim,
                                         num_heads,
                                         qkv_bias=True,
                                         mlp_ratio=4.,
                                         dropout=0.,
                                         attention_dropout=0.,
                                         droppath=depth_decay[i])
            layer_list.append(copy.deepcopy(encoder_layer))
        self.layers = nn.LayerList(layer_list)
……

应该改成:

class Encoder(nn.Layer):
    """Transformer encoder
    Encoder encoder contains a list of EncoderLayer, and a LayerNorm.
    Attributes:
        layers: nn.LayerList contains multiple EncoderLayers
        encoder_norm: nn.LayerNorm which is applied after last encoder layer
    """
    def __init__(self,
                 embed_dim,
                 num_heads,
                 depth,
                 qkv_bias=True,
                 mlp_ratio=4.0,
                 dropout=0.,
                 attention_dropout=0.,
                 droppath=0.):
        super(Encoder, self).__init__()
        # stochatic depth decay
        depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
        layer_list = []
        for i in range(depth):
            encoder_layer = EncoderLayer(embed_dim,
                                         num_heads,
                                         qkv_bias=qkv_bias,
                                         mlp_ratio=mlp_ratio,
                                         dropout=dropout,
                                         attention_dropout=attention_dropout,
                                         droppath=depth_decay[i])
            layer_list.append(copy.deepcopy(encoder_layer))
        self.layers = nn.LayerList(layer_list)
……

以上~

xperzy commented 2 years ago

Thanks for the issue! The bug is fixed now. So I close this issue. https://github.com/BR-IDL/PaddleViT/blob/42360a611eafa82560d8219fb45596094e738cfb/image_classification/ViT/transformer.py#L300-L306