Kwai-Kolors / Kolors

Kolors Team
Apache License 2.0
3.43k stars 219 forks source link

Why vocab size of tokenizer is not equal to embedding shape of text_encoder? #75

Open joey0922 opened 1 month ago

joey0922 commented 1 month ago

I found that text_encoder's embedding shape is 65024 while the vocab size of tokenizer is 64796. Is it not necessary to make this two value be equal? If so, how can I initialize embeddings of added new special tokens?