apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

How does weight tying work in MxNet? #16684

Closed zeeshansayyed closed 4 years ago

zeeshansayyed commented 4 years ago

Hi, I have a question regarding weight tying in the encoder and decoder embedding matrix of the NMTModel. Consider these lines of the NMTModel class.

As has been described in Issue #7785 and this section in d2l.ai, we usually do it by calling the params argument of the layer new layer and pass the params of the layer from which we want to tie.

But in the NMTModel class referenced above, we are simply assigning the references in line 98 and 101 as self.src_embed = src_embed and self.tgt_embed = self.src_embed.

Is this also a valid way of tying weights?

leezu commented 4 years ago

I'll respond in https://github.com/dmlc/gluon-nlp/issues/996