google-research / albert

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Apache License 2.0
3.23k stars 570 forks source link

Where is the function of Factorized Embedding Parameterization? #210

Open hntee opened 4 years ago

hntee commented 4 years ago

Hi all, I read the paper and some of the code, the paper indicates that there is a intermediate matrix ( E ) that factorizes the V -> H embedding lookup table to V -> E -> H matrices. However the code in https://github.com/google-research/albert/blob/c21d8a3616a4b156d21e795698ad52743ccd8b73/modeling.py#L199-L206 seems that the embedding is directly mapped from the input tensor.

So where is intermediate matrix V * E ? Am I missing something?

asharma20 commented 4 years ago

I have the same question. I would like some confirmation on this but from what I understand the shapes of self.word_embedding_output, self.output_embedding_table, self.embedding_output are with respect to the embedding size. The code uses batches so I believe instead of the (V, E) matrix, you are looking for the (batch_size, seq_length, E) matrix, which is self.embedding_output. This is later input into the transformer model and projected to the hidden_size (batch_size, seq_length, H) in the following: https://github.com/google-research/albert/blob/c21d8a3616a4b156d21e795698ad52743ccd8b73/modeling.py#L1085-L1087

hntee commented 4 years ago

Thanks @asharma20 !