Closed zeeshansayyed closed 4 years ago
Hi @zeeshansayyed, in this case self.tgt_embed
and self.src_embed
are just two different names for the same nn.HybridSequential
. So you could say that they don't share parameters, but rather are the same.
You are right that for sharing parameters between separate Blocks
, you need to pass the ParameterDict
of the other Block
via the params
argument. With repsect to the NMTModel
, you can see an example of sharing between the TransformerDecoder
used during training and the TransformerOneStepDecoder
used during testing at: https://github.com/dmlc/gluon-nlp/blob/57a45aaf7e82a826e1bffb133c328f913844bd4c/src/gluonnlp/model/transformer.py#L1202-L1206
I'll close the issue for now, but please feel free to reopen if you have further questions.
Thank you.
Hi, I have a question regarding weight tying in the encoder and decoder embedding matrix of the NMTModel. Consider these lines of the NMTModel class.
As has been described in Issue #7785 of mxnet repo and this section in d2l.ai, we usually do it by calling the
params
argument of the layer new layer and pass the params of the layer from which we want to tie.But in the NMTModel class referenced above, we are simply assigning the references in line 98 and 101 as
self.src_embed = src_embed
andself.tgt_embed = self.src_embed
.Is this also a valid way of tying weights?
Note: I have also opened an issue in the mxnet repo since I didn't know which was the right place to ask. I will close the one which is not needed.