why the word embedding shape need to be multiply by 2?

aliyun / aicb

Other

145 stars 21 forks source link

why the word embedding shape need to be multiply by 2? #10

Closed xiongjun19 closed 1 month ago

xiongjun19 commented 1 month ago

I'm a little confused about the shape of word embedding in the Megatron Model? why the num of embedding need to multiply by two : self.word_embedding = MockedParam( (2 * num_embedding_per_partition, hidden_size), name=self.name )

zhouheyang-alibaba commented 1 month ago

The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype , where the value 2 indicates that the data type corresponds to 2 bytes. However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.

xiongjun19 commented 1 month ago

The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype , where the value 2 indicates that the data type corresponds to 2 bytes. However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.

great, I understand now, thanks for your great job