Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Hi, I have a question regarding weight tying in the encoder and decoder embedding matrix of the NMTModel. Consider these lines of the NMTModel class.
As has been described in Issue #7785 and this section in d2l.ai, we usually do it by calling the params argument of the layer new layer and pass the params of the layer from which we want to tie.
But in the NMTModel class referenced above, we are simply assigning the references in line 98 and 101 as self.src_embed = src_embed and self.tgt_embed = self.src_embed.
Hi, I have a question regarding weight tying in the encoder and decoder embedding matrix of the NMTModel. Consider these lines of the NMTModel class.
As has been described in Issue #7785 and this section in d2l.ai, we usually do it by calling the
params
argument of the layer new layer and pass the params of the layer from which we want to tie.But in the NMTModel class referenced above, we are simply assigning the references in line 98 and 101 as
self.src_embed = src_embed
andself.tgt_embed = self.src_embed
.Is this also a valid way of tying weights?