Open dongcy-AHU opened 3 years ago
Hello, may I ask, how is the [CLS] token in multi-layer self-attention initialized? What is its vector dimension?
Hi,
Thank you for your attention. It's just a torch.nn.Embedding, 768 dimension, the initialization function is torch.init.normal_ Just look at the source and do a little search.
Hello, may I ask, how is the [CLS] token in multi-layer self-attention initialized? What is its vector dimension?