baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

为什么baichuan模型中只有decodelayer,没有encodelayer? #80

Closed monkeyshichi closed 1 year ago

monkeyshichi commented 1 year ago

Required prerequisites

Questions

为什么baichuan模型中只有decodelayer,没有encodelayer?transformer模型架构中不是还有编码层吗? 这行代码self.layers = nn.ModuleList([DecoderLayer(config) for _ in range(config.num_hidden_layers)]) 只有decoderLayer class Model(PreTrainedModel): """ Transformer decoder consisting of config.num_hidden_layers layers. Each layer is a [DecoderLayer]

Args:
    config: BaiChuanConfig
"""

def __init__(self, config: BaiChuanConfig):
    super().__init__(config)
    self.padding_idx = config.pad_token_id
    self.vocab_size = config.vocab_size

    self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
    self.layers = nn.ModuleList([DecoderLayer(config) for _ in range(config.num_hidden_layers)])
    self.norm = RMSNorm(config.hidden_size, eps=config.rms_norm_eps)

    self.gradient_checkpointing = False
    # Initialize weights and apply final processing
    self.post_init()

Checklist

ninehills commented 1 year ago

因为这是一个纯 Decoder 模型。

现在火的模型基本都是 Decoder-only 模型,包括 GPT、LLaMA、Falcon、Bloom 等等。

monkeyshichi commented 1 year ago

好的,谢谢!