How does weight tying work between src_embed and tgt_embed in NMTModel?

zeeshansayyed commented 4 years ago

Hi, I have a question regarding weight tying in the encoder and decoder embedding matrix of the NMTModel. Consider these lines of the NMTModel class.

As has been described in Issue #7785 of mxnet repo and this section in d2l.ai, we usually do it by calling the params argument of the layer new layer and pass the params of the layer from which we want to tie.

But in the NMTModel class referenced above, we are simply assigning the references in line 98 and 101 as self.src_embed = src_embed and self.tgt_embed = self.src_embed.

Is this also a valid way of tying weights?

Note: I have also opened an issue in the mxnet repo since I didn't know which was the right place to ask. I will close the one which is not needed.

leezu commented 4 years ago

Hi @zeeshansayyed, in this case self.tgt_embed and self.src_embed are just two different names for the same nn.HybridSequential. So you could say that they don't share parameters, but rather are the same. You are right that for sharing parameters between separate Blocks, you need to pass the ParameterDict of the other Block via the params argument. With repsect to the NMTModel, you can see an example of sharing between the TransformerDecoder used during training and the TransformerOneStepDecoder used during testing at: https://github.com/dmlc/gluon-nlp/blob/57a45aaf7e82a826e1bffb133c328f913844bd4c/src/gluonnlp/model/transformer.py#L1202-L1206

I'll close the issue for now, but please feel free to reopen if you have further questions.

zeeshansayyed commented 4 years ago

Thank you.

dmlc / gluon-nlp

How does weight tying work between src_embed and tgt_embed in NMTModel? #996