May I ask why VQVAE's learned codebook was not used as the Embedding layer for autoregressive training of language models,
but only the indices of the codebook were used, and a new learnable Embedding layer was created for GPT as bellow:
self.tok_embeddings = nn.Embedding(config.vocab_size, config.dim)
May I ask why VQVAE's learned codebook was not used as the Embedding layer for autoregressive training of language models, but only the indices of the codebook were used, and a new learnable Embedding layer was created for GPT as bellow:
self.tok_embeddings = nn.Embedding(config.vocab_size, config.dim)