BR-IDL / PaddleViT

:robot: PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+
https://github.com/BR-IDL/PaddleViT
Apache License 2.0
1.22k stars 318 forks source link

关于 ViT Transformer Encoder 中 encoder_layer 深拷贝的疑问 #78

Closed libertatis closed 2 years ago

libertatis commented 2 years ago

PaddleViT/image_classification/ViT/transformer.py Encoder 初始化创建 encoder_layer 时,深拷贝的意义是什么呢?

class Encoder(nn.Layer):

    def __init__(self,
                 embed_dim,
                 num_heads,
                 depth,
                 qkv_bias=True,
                 mlp_ratio=4.0,
                 dropout=0.,
                 attention_dropout=0.,
                 droppath=0.):
        super(Encoder, self).__init__()
        # stochatic depth decay
        depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
        layer_list = []
        for i in range(depth):
            encoder_layer = EncoderLayer(embed_dim,
                                         num_heads,
                                         qkv_bias=qkv_bias,
                                         mlp_ratio=mlp_ratio,
                                         dropout=dropout,
                                         attention_dropout=attention_dropout,
                                         droppath=depth_decay[i])
            layer_list.append(copy.deepcopy(encoder_layer))  # 这里对encoder_layer做深拷贝的意义是什么呢?
        self.layers = nn.LayerList(layer_list)
……

for 循环创建 encoder_layer,每一个 encoder_layer 都是不同的对象,也不存在参数共享的问题,这里对 encoder_layer 做深拷贝是基于什么样的考量呢? 我个人认为这里对 encoder_layer 的深拷贝是不必要的:

layer_list.append(encoder_layer)

可能是我想的还不够深,期待官方的答疑解惑~

xperzy commented 2 years ago

这个深拷贝是因为在较早期的Paddle2.0版本中,以list方式创建encoderlayer的时候会出现浅拷贝的情况,也就是说,添加的层并不会完全新建一个对象,这会造成运算的时候结果不正确。但是这个问题应该在新版本的Paddle中已经得到解决,所以目前应该不用去考虑这个问题,或者说用或者不用都可以。

This issue have been explained here, so I close this issue.