Closed xiongjun19 closed 1 month ago
The dimension of word_embedding is num_embedding_per_partition * hidden_size * params_dtype
, where the value 2 indicates that the data type corresponds to 2 bytes.
However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.
The dimension of word_embedding is
num_embedding_per_partition * hidden_size * params_dtype
, where the value 2 indicates that the data type corresponds to 2 bytes. However, after carefully reading the code, I found that the data type of word_embedding is float32, which should correspond to 4 bytes. I will modify this to 4. Future versions will further support custom dtype.
great, I understand now, thanks for your great job
I'm a little confused about the shape of word embedding in the Megatron Model? why the num of embedding need to multiply by two :
self.word_embedding = MockedParam( (2 * num_embedding_per_partition, hidden_size), name=self.name )