Open lonngxiang opened 3 years ago
好的多谢,刚我在多头和这多层弄混了,把当成一回事了
------------------ Original message ------------------ From: "notifications"; Sendtime: Sunday, Jul 25, 2021 1:11 PM To: "cmd23333/BERT-Tensorflow2.x"; Cc: @.***>; "Author"; Subject: Re: [cmd23333/BERT-Tensorflow2.x] 麻烦问下Num_Transformer_Layers是表示什么意思呢 (#2)
堆叠Transformer编码器的次数,因为BERT里的Transformer编码器输入和输出的维度是相同的,所以每经历一次Transformer编码器,相当于对输入进行了一次更“深度”的编码。谷歌的BERT堆叠了12层 Transformer。
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
堆叠Transformer编码器的次数,因为BERT里的Transformer编码器输入和输出的维度是相同的,所以每经历一次Transformer编码器,相当于对输入进行了一次更“深度”的编码。谷歌的BERT堆叠了12层 Transformer。